This is the second of probably three posts (the first was on run-length encoding in Haskell) inspired by Thomas Guest’s interesting article on the Deflate algorithm. This is also my first post in literate haskell. Please post any improvements to the code if you have them! For a refreshingly readable introduction to Huffman Coding and the Deflate algorithm, have a look at this short explanation by Antaeus Feldspar.
First some boilerplate…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
Building the Coding Trees
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
(En/De)coding Functions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Usage Examples
UPDATE: I scrapped and re-did this section. Should be a little better now.:
I realize I need a more compelling example and some explanation. First, imagine we want to encode into binary the following phrase:
1
| |
we could represent it in ASCII but that would be wasteful of space, using a whole byte per character when we are using only ten of the 256 possible codes in the ASCII character set.
So we generate a list of the characters to encode along with their frequencies (symbols with higher frequencies will be given shorter prefix codes, saving space!):
1 2 3 | |
…and encode it using the dictionary built from the code tree we just built:
1 2 | |
This yields the following binary stream of 8 bits (vs. 20 if we had used ASCII encoding):
1 2 | |
Of course we have to encode instructions to build our tree along with the above message.