Adaptive dictionary-based compression
The discussion so far has focussed on the use of a static dictionary.
However there is no reason why an adaptive system should not be used
with the scheme.
Dynamic modifications to the dictionary may be permitted - provided they can be
made at corresponding points in the compression and decompression processes.
In practice this means that a system which modifies the dictionary in ways
which are based entirely on the decompressed text preceeding the current
position will work.
Two main operations are required:
- Adding new dictionary entries;
- Deleting dictionary entries.
Deleting dictionary entries will help to reduce the space they take up if they
recur in the future.
Adding dictionary entries during compression takes the form of verbatim copying
of them into the output stream. Next the word is checked to make sure it
does not clash with any existing dictionary entries. If it is a legal entry, it
is next added to the dictionary, with a matching "short infrequent" symbol,
taken from an ordered stack of these. If this stack is empty, infrequently used
dictionary entries may be deleted and their symbols reused.
During decompression, if a non-dictionary word is spotted, which would have
been added during the compression phase, it is again added to the dictionary.
Deleting entries whose "short, infrequent" corresponding entries are cropping up
in the pre-compressed text helps when compressing data that does not conform well
to the existing dictionary.
When compressing, deleting words, is a case of copying the corresponding
expanded dictionary entry into the compressed plaintext *once*, and then
removing the symbol. It's matching word may be recycled with a symbol
from the stack of spare symbols, or with a symbol of an infrequently-used
word.
When decompressing, if any expanded dictionary entries are found in the input
stream, they are translated into their matching entries, and then excised from
the dictionary, using the same operation applied when compressing.
The scheme described above destroys the symmetry between compression and
decompression phases.
|