Compression and Security

Index

3. The current position

Problems

One of the problems with the current situation is that - since the very first non-trival one-on-one compressor known to the general public has only recently been invented - there is currently a paucity of compression routines which have this property.

Regrettably, this means that those wishing to eliminate any regularities in their source statistics before encrypting face an unfortunate choice:

They can use one of the few non-trivial one-on-one compressors currently freely available. However, these will probably fail to compress their target files very well.

Alternatively, they can pick and choose a compressor which targets their data and gets a good compression ratio. Unfortunately, none of these compressors appear to be one-on-one.

Consequently, they have to make a decision between one type of security risk, and another. This is never a pleasant situation to be in - especially since quantifying the qualitatively differing risks associated with each type of problem can be somewhat problematical.

Consideration of the "optimal" compression ratio

Fortunately, in the future people should not have to choose between security risks in the way that they have to do today. It is simple to demonstrate, that for a given particular frequency distribution of target files, and an associated desirability of compressing them, all the deterministic compression routines with the best possible compression ratio also happen to be one-on-one.

The purpose of these pages

In order to help rectify this situation as rapidly as possible, an instructional recipe for creating one-on-one compression routines which are targeted at particular data types will be presented in these pages.

The main immediate reason for doing this is to assist with the task of one-on-one compression of English text messages, of the type that are currently frequently sent by email. Consequently the examples I give are targeted at that domain. However, the technique presented is of general applicability, and I have tried to phrase my explanations in such a way make it easy to see how to apply the technique in other domains.

Index | Links

tim@tt1.org | http://mandala.co.uk/