BIRLEBIRLE is a bijective Run Length Encoding (RLE) compressor implemented in Java.Like BIAC, it is another port of a compressor which was written by Mark Nelson and modified to make bijective by David Scott. I've modified it in such a way that it is no longer compatible with the original. I felt it was not conservative enough - and had too-great a tendency to cause expansion on perfectly normal files. As a consequence I changed it to consider a run as anything starting wiuth three repeated characters (rather than the original two). This modification increased the total compression ratio on the corpus from 11.49% to 12.72%. Increasing the run threshold further appeared to have deleterious results overall. This code is intended to act as a pre-processor for other compression schemes. It leaves the overall format of most files more-or-less intact (and just eliminates runs) leaving the floor open for subsequent compressors of different types. The improvement that can be expected is relatively small on normal files - but is increased if the files in question can be expected to contain many runs. For example, BIAC reduces the files in the corpus from 3,312,291 bytes to 1,841,675 bytes - a reduction of some 44.39%. If BIRLE is applied first, the reduction is to 1,821,284 bytes, making an overall compression ratio of 45.01%. The saving is of 20,391 bytes - or 0.62% of the total. There is a GUI front end as well as a command-line interface. Here's a snapshot of it in action:
Download the executable Jar file. Download the Java source code. Browse the source code. Browse the associated javadoc. Results on the Calgary Corpus test suite can be found here.
|