Class RLE


public class RLE
extends java.lang.Object

Ported to Java by Tim Tyler, July 2001 -

I modified the program as follows:
+ No longer sensitive" to leading zeros in the file.
+ Now required three consecutive characters (rather than two) before RLE coding kicks in.

David Scott's comments follow:

This has been changed to be bijective and unadulterated meaning that for any file X then compress(uncompress(X))=X and uncompress(compress(X))=X. This was not the case in the earlier version by Mark.

This mode is done by David A. Scott 2000 Jan 5.

Use at your own risk.

I got it to work using DJGPP GNU C port to DOS.

The main changes are the EOF treatment and the fact that very long repeats in the old version are of form aa255a255a255aN where the a255 repeated for as many times as needed the undulterated form of this kind of exansion is aaNaaaa where the trailing a's represent 256 a's.

These changes are such that the method will always compress as well as the old RLE or better. In fact for some files with very long repeats it will compress twice as good. the first savings occur at repeat of 258 as shown in example:

baaaa..<258 a's total>...aac





In the old method aa0b and aa0a have same meaning or the aa0a form not really used since aa1 is same thing. What the old method failed to do was make wise use of the patterns so that information is being added to the file as it compresses so it should not be used if one is encyrpting since a unadulterated RLE is available that always matches or beats the old RLE method.

Another example more obvious:

take the compressed file


this when uncompressed and recompressed old way becomes


which means the first file could not be a result of the current RLE compression meaning if it was the result of guessing a key for encryption that key used is wrong.

One should avoid use compression methods that aid someone in breaking the code.

In the new method


when uncompressed and rcompressed back comes back as same file.


Mark Nelson

March 8, 1996


This program performs a Run Length Encoding function on an input file/stream, and sends the result to an output file or stream. In the output stream, any two consecutive characters with the same value flag a run. A byte following those two characters gives the count of *additional* repeat characters, which can be anything from 0 to 255.

Using the RLE program as a front end to BWT avoids pathologically slow sorts that occur when the input stream has long sequences of identical characters. (Which means comparison functions have to spend lots of time on a pair of strings before deciding who is larger.)

This program takes two arguments: an input file and an output file. You can leave off one argument and send your output to stdout. Leave off two arguments and read your input from stdin as well.

This program accompanies my article "Data Compression with the Burrows-Wheeler Transform."

Build Instructions

Define the constant unix for UNIX or UNIX-like systems. The use of this constant turns off the code used to force the MS-DOS file system into binary mode. g++ already does this, your UNIX C++ compiler might also.

Borland C++ 4.5 16 bit : bcc -w rle.cpp
Borland C++ 4.5 32 bit : bcc32 -w rle.cpp
Microsoft Visual C++ 1.52 : cl /W4 rle.cpp
Microsoft Visual C++ 2.1 : cl /W4 rle.cpp
g++ : g++ -o rle rle.cpp

Typical Use

rle < raw-file | bwt | mtf | rle | ari > compressed-file

Constructor Summary
Method Summary
 void compress(java.lang.String input_file, java.lang.String output_file)
          Compresses input_file and store the results in output_file
static void main(java.lang.String[] args)
          Handles the Commmand-line interface to the compressor
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public RLE()
Method Detail


public void compress(java.lang.String input_file,
                     java.lang.String output_file)
Compresses input_file and store the results in output_file


public static void main(java.lang.String[] args)
Handles the Commmand-line interface to the compressor