Problems
One of the problems with the current situation is that - since the very first
non-trival one-on-one compressor known to the general public has only recently
been invented - there is currently a paucity of compression routines which have
this property.
Regrettably, this means that those wishing to eliminate any
regularities in their source statistics before encrypting face an unfortunate
choice:
They can use one of the few non-trivial one-on-one compressors
currently freely available. However, these will probably fail to compress
their target files very well.
Alternatively, they can pick and choose a compressor which targets their data
and gets a good compression ratio. Unfortunately, none of these compressors
appear to be one-on-one.
Consequently, they have to make a decision between one type of
security risk, and another. This is never a pleasant situation to be in -
especially since quantifying the qualitatively differing risks associated with
each type of problem can be somewhat problematical.
Consideration of the "optimal" compression ratio
Fortunately, in the future people should not have to choose between security
risks in the way that they have to do today. It is simple to demonstrate, that
for a given particular frequency distribution of target files, and an associated
desirability of compressing them, all the deterministic compression
routines with the best possible compression ratio also happen to be
one-on-one.
The purpose of these pages
In order to help rectify this situation as rapidly as possible, an instructional
recipe for creating one-on-one compression routines which are targeted at
particular data types will be presented in these pages.
The main immediate reason for doing this is to assist with the task of
one-on-one compression of English text messages, of the type that are currently
frequently sent by email. Consequently the examples I give are targeted at that
domain. However, the technique presented is of general applicability, and I have
tried to phrase my explanations in such a way make it easy to see how to apply
the technique in other domains.
|