Reverse Parsing - a sister scheme
The discussion here has considered parsing the file from top to bottom, and
looking up words in the dictionary starting with their first letter and
proceeding towards their last. In other wiords the direction of motion through
the file and the direction of motion through words in the dictionary have both
been "forwards".
This is not the only possible parsing scheme. An alternative where
these two vectors point in opposite direction produces a different algorithm,
with different constraints on the dictionary. However, it appears that
the end result will be closely equivalent.
The algorithm will be presented anyway. It's alternative perspective may throw
light on the workings of the technique, or provide other insights.
Start at the end of the file, work backwards, one character at a time, until either:
- The start of the file is reached, or...
- It is detected that the current character is the first letter of a
dictionary entry.
In the first case, prepend these letters to the start of the output file, and stop.
In the second case, prepend a string corresponding to the matching dictionary
entry for the symbol found, followed by any characters after the end of
the dictionary entry found and before the end of the input file.
After this has been done, truncate the input file at the start of the dictionary
entry just located and repeat - until the start of the input file has been located.
|
When using this algorithm, the "no substrings" condition (presented in the
document about static dictionary compression) is no longer required.
However, in its place another condition must be used. This is that if
one string in the dictionary is a prefix of another one, the shorter of the two
strings must always be used.
The "leading strings never match trailing strings" condition (also presented in
the document about static dictionary compression)
must, however, remain in force.
Eliminating the "no substrings condition" means that less time need be spent
"cleaning" the dictionary. On the other hand, any entries remaining in the
dictionary which contain other entries as partial substrings wind up never being
used - and are just dead flesh.
With a properly cleaned an efficient dictionary, the two schemes appear to be
equivalent to one another. One remaining question is whether one approach
may be implemened with greater computational efficiency than the other.
|