Compression and Security

Index

2. One on one compression

David Scott has recently made a number of proposals relating to the properties of compression routines intended to be used in conjunction with cryptography.

It has been known for some time that compression programs with intact headers could provide clues to analysts that allow known-partial-plaintext attacks.

People have also been aware that certain types of structure within some types of compressed file could also provide clues to aid cryptanalytic attacks.

The fear had been publicly expressed that compression before encryption was bad, because it introduces an entirely new kind of attack: you can rule out keys without any knowledge of the plaintext (apart from knowledge of the way in which it has been compressed) by decyphering using the specified key, and then attempting to decompress. If the decompressor gives errors you can eliminate that key from consideration.

Simply suppressing errors from the compressor is easy to do - however preventing this type of attack requires more subtlety. Even if the decompressor spits out a valid file for every possible compressed input (and if the opponent still has convenient access to the compressor as well as the decompressor) the attacker can try decompressing the file and then compressing it again to see if it returns to its original state. If it does not (and the compressor is deterministic and lossless) the opponent can still conclude that the file was not created by the specified compressor. If he knows that this was the compressor which was actually used, he can still eliminate that key from consideration.

It has been protested by detractors of the idea that attacks on modern cyphers make a simple brute-force attack based on systematically eliminating keys impractical. Of course there's still some security risk - but people have tried to claim that this consideration means that it is of little significance. If the only attack one-on-one compression prevents is a brute-force attack on the keyspace, it really isn't worth bothering with, they say.

Such considerations are based on mistaken premises.

The very fact that a new attack is possible means that the compression routine (assuming it is deterministic) has systematically added information to the file that was not present in the plaintext. In effect, when faced with more than one possible file to compress to, the compressor has "chosen" one of the possible compressed files file over other ones by some criterion. In principle an attacker may be able to analyse any regularities present in these choices, and use this to get a "foot in the door" of the cypher.

There is generally no particular reason to think that they will require the whole of the compressed file to do this. Indeed, unless you can demonstrate otherwise, you should assume the worst possible scenario - that the attacker may be able to eliminate keys based on small fragments of the compressed file, when evaluating the security of the resulting system.

David has provided what appears to be a new framework for thinking about these problems, and a practiacal demonstration of how they may be eliminated.

David's proposed remedy is simple: simply make the compression system bijective. For every possible compressed file there is one, and only one corresponding decompressed file (and - assuming the compression systems under consideration are not lossy - vise versa).

David refers to this property of compressors as "one-on-one" - and this term is what will generally be used throughout the documents present here.

Making a compressor "one-on-one" completely eliminates any security problem introduced by the compressor actively adding information to the file - by simply reducing the quantity of this information to zero bytes.

In addition to this theorizing, David has built his own one-on-one compressor. This is based on an adaptive Huffman compression scheme.

Of course the "one-on-one" property does not guarantee that a good compression ratio occurs - and any failure to remove as many regularities as possible from the file would obviously introduce security problems of its own.

Index | Main Index | Links

tim@tt1.org | http://mandala.co.uk/