David Scott has recently made a number of proposals relating to the properties
of compression routines intended to be used in conjunction with cryptography.
It has been known for some time that compression programs with intact headers could
provide clues to analysts that allow known-partial-plaintext attacks.
People have also been aware that certain types of structure within some types of
compressed file could also provide clues to aid cryptanalytic attacks.
The fear had been publicly expressed that compression before encryption was
bad, because it introduces an entirely new kind of attack: you can rule out keys
without any knowledge of the plaintext (apart from knowledge of the way in which
it has been compressed) by decyphering using the specified key, and then
attempting to decompress. If the decompressor gives errors you can
eliminate that key from consideration.
Simply suppressing errors from the compressor is easy to do - however preventing
this type of attack requires more subtlety. Even if the decompressor
spits out a valid file for every possible compressed input (and if the opponent
still has convenient access to the compressor as well as the decompressor) the
attacker can try decompressing the file and then compressing it again to see if
it returns to its original state. If it does not (and the compressor is
deterministic and lossless) the opponent can still conclude that the file was
not created by the specified compressor. If he knows that this was the
compressor which was actually used, he can still eliminate that key from
consideration.
It has been protested by detractors of the idea that attacks on modern cyphers
make a simple brute-force attack based on systematically eliminating keys
impractical. Of course there's still some security risk - but people
have tried to claim that this consideration means that it is of little
significance. If the only attack one-on-one compression prevents is a
brute-force attack on the keyspace, it really isn't worth bothering with, they
say.
Such considerations are based on mistaken premises.
The very fact that a new attack is possible means that the compression routine
(assuming it is deterministic) has systematically added information to the file
that was not present in the plaintext. In effect, when faced with more than one
possible file to compress to, the compressor has "chosen" one of the possible
compressed files file over other ones by some criterion. In principle an
attacker may be able to analyse any regularities present in these choices, and
use this to get a "foot in the door" of the cypher.
There is generally no particular reason to think that they will require the
whole of the compressed file to do this. Indeed, unless you can demonstrate
otherwise, you should assume the worst possible scenario - that the attacker may
be able to eliminate keys based on small fragments of the compressed file,
when evaluating the security of the resulting system.
David has provided what appears to be a new framework for thinking about these
problems, and a practiacal demonstration of how they may be eliminated.
David's proposed remedy is simple: simply make the compression system
bijective.
For every possible compressed file there is one, and only
one corresponding decompressed file (and - assuming the compression systems
under consideration are not lossy - vise versa).
David refers to this property of compressors as "one-on-one" - and this term is
what will generally be used throughout the documents present here.
Making a compressor "one-on-one" completely eliminates any security problem
introduced by the compressor actively adding information to the file - by simply
reducing the quantity of this information to zero bytes.
In addition to this theorizing, David has built his own one-on-one compressor.
This is based on an adaptive Huffman compression scheme.
Of course the "one-on-one" property does not guarantee that a good
compression ratio occurs - and any failure to remove as many regularities as
possible from the file would obviously introduce security problems of its own.
|