UPDATE: Thanks to steve for pointing out that the scheme I describe here is essentially the same as Algorithm BSTW.
I came up with this variation of the MovetoFront transform which doesn’t require that there be a known alphabet of symbols. Instead it builds up the alphabet as it goes along and encounters a new symbol. I’m sure that this is a known algorithm, as it’s a fairly obvious variation, but I don’t know what the proper name for it is, so I’m calling it an Adaptive Move toFront.
I’m very interested in hearing if this is similar or identical to any other algorithms, so if anyone has any insights, please comment!
Differences from the Standard MTF Transform:
The traditional MTF algorithm, as I understand it, uses some known ordered alphabet of symbols (e.g. the bytes from 0255, or the letters az), so presumably, one need not store this alphabet along with the encoded/compressed data.
With the algorithm I describe here though, the final permutation of the alphabet list, output by the encoder, must be retained to decode the message. The encoded message is identical to the output of the traditional MTF transform, except where a symbol is encoded which has not appeared previously.
Here we have zipped the output of the standard MTF (the fst
) with the
adaptive version (the snd
) to make it easy to compare elements:
Message: “the rrrrain in sssspain falls maaiinly on the plain”
Output (standard MTF, adaptive):
[(116,0),(105,1),(103,2),(35,3),(115,4),(0,0),(0,0),(0,0),(101,5),(107,6),(11 2,7),(4,4),(2,2),(2,2),(2,2),(116,8),(0,0),(0,0),(0,0),(115,9),(5,5),(5,5),(5, 5),(5,5),(109,10),(4,4),(113,11),(0,0),(7,7),(4,4),(114,12),(4,4),(0,0),(7,7), (0,0),(7,7),(6,6),(121,13),(6,6),(116,14),(4,4),(2,2),(14,14),(14,14),(14,14), (3,3),(13,13),(8,8),(10,10),(10,10),(8,8)]
The standard algorithm was working with the ASCII alphabet, so each new symbol is located somewhere far back into the alphabet; it is then moved to the front. In the adaptive version, when we see a new symbol, it is automatically made the next in our running alphabet and then of course moved to the front.
This means that with the adaptive version, for any given stretch from the beginning of the encoded message, we are using the minimal number of different index integers to encode the stream, always choosing the next greatest integer when we require a new one. Here is the output of the adaptive version alone:
Encoded phrase:
[0,1,2,3,4,0,0,0,5,6,7,4,2,2,2,8,0,0,0,9,5,5,5,5,10,4, 11,0,7,4,12,4,0,7,0,7,6,13,6,14,4,2,14,14,14,3,13,8, 10,10,8]
Final permutation of minimal dictionary list:"nialp ehtoymsfr"
What these means for actual compression schemes on a binary level, I have no idea, but I would like to explore this topic much more. Without further ado, here is some code.
The Encoder:
In literate haskell:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 

The Decoder
Our decoder requires that we pass it the final permutation of the symbol list, along with the encoded list of indexes. It then simply does the reverse of the encoding function.
To decode we follow these steps:
Return the head of the list of symbols
Reinsert the symbol into the index location specified by the head of the encoded/index list
Move on to the next encoded index, using the new list of symbols. go to (1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
