
Huffman Encoding/Decoding (used for file compression)
=====================================================

-- Gives a unique encoding/decoding (meaning you can encode any bitstring,
   e.g., strings, into bits and decode the bits and get the string back). 
   The encoding is not unique, depends how you build the tree (small on left 
   or small on right). 

-- Is an optimal solution.

The idea behind the algorithm is that characters used more frequently have
a shorter encoding. Characters not used as often have a longer encoding.

These examples always put the smallest frequency value on the left.

Example one
===========

  Letter   Frequency
  E        22
  A        20
  R        15
  S        18
  P        19

Find the two characters with the lowest frequency and build a tree with the
characters as the leaves. (Either always put the smallest value on the left
or on the right. Must be consistent.) The root's frequency value is the sum of
the frequencies. No longer consider the original characters in the frequency
list. Now use the root's value in the list and find the next two smallest
frequencies. Continue this until they are all used up.

The characters R and S have the smallest frequency. Build binary tree.

      RS (33)
     /  \
    /    \
   R(15)  S(18)

Adjusted list:
  Letter   Frequency
  E        22
  A        20
  P        19
  RS       33

Now the characters P and A have the smallest frequency. 
Note that as I show the tree(s), I am cheating in that I know the answer
and put everything in the "correct" spot. On paper, you may need to redraw.

       PA(39)                 RS(33)
      /  \                   /  \
     /    \                 /    \
    P(19)  A(20)           R(15)  S(18)

Adjusted list:
  Letter   Frequency
  E        22
  RS       33
  PA       39

Now the two smallest frequencies are E and RS.

                         ERS(55)
                        /   \
                       /     \
       PA(39)         /       RS(33)
      /  \           /       /  \
     /    \         /       /    \
    P(19)  A(20)   E(22)   R(15)  S(18)

Adjusted list:
  Letter   Frequency
  PA       39
  ERS      55

Finish the tree with root combining PA and ERS.

                PAERS
             /        \
            /          \
           /            \
          /              ERS(55)
         /              /   \
        /              /     \
       PA(39)         /       RS(33)
      /  \           /       /  \
     /    \         /       /    \
    P(19)  A(20)   E(22)   R(15)  S(18)

Now label all the left edges with zero and right edges with one.

                PAERS
             /        \
            /          \  1
         0 /            \
          /              ERS(55)
         /              /   \  1
        /              /     \
       PA(39)       0 /       RS(33)
    0 /  \ 1         /     0 /  \  1
     /    \         /       /    \
    P(19)  A(20)   E(22)   R(15)  S(18)

The path of the edges to the character at the leaf is the encoding.

    P      A       E       R      S    
    00     01      10      110    111

Notice that because the frequencies are close in value, the encodings
are close in length.

Encode: PEAR   001001110
Decode: 001001110 
  Starts with zero so is either P or A. With second bit as 0, must be P.
  Next bit is one so must be E, R, or S. With the next bit zero, it must be E.
  The fifth bit is zero, sixth bit is one, so must be A.
  The last three bits as one, one, zero, conclude it must be R.


Example two
===========

  Letter   Frequency
  E        100
  A        50
  R        25
  S        20
  P        10

Build tree with P and S.

      PS (30)
     /  \
    /    \
   P(10)  S(20)

Adjusted list:
  Letter   Frequency
  E        100
  A        50
  R        25
  PS       30

Next smallest frequencies are for R and PS.

         RPS(55)
        /   \
       /     \
      /       PS (30)
     /       /  \
    /       /    \
   R(25)   P(10)  S(20)

Adjusted list:
  Letter   Frequency
  E        100
  A        50
  RPS      55

Next smallest frequencies are for A and RPS.

            ARPS(105)
           /    \
          /      \
         /        RPS(55)
        /        /   \
       /        /     \
      /        /       PS (30)
     /        /       /  \
    /        /       /    \
   A(50)    R(25)   P(10)  S(20)

Adjusted list:
  Letter   Frequency
  E        100
  ARPS     105

Finish the tree with root combining E and ARPS and label all the left edges
with zero and right edges with one.

               EARPS
              /    \ 1
             /      \
            /        ARPS(105)
           /        /    \ 1
          /        /      \
       0 /        /        RPS(55)
        /     0  /        /   \ 1
       /        /        /     \
      /        /      0 /       PS (30)
     /        /        /     0 /  \ 1
    /        /        /       /    \
   E(100)   A(50)    R(25)   P(10)  S(20)

The encoding is as follows. Notice that because the frequencies are far apart
in value, the encoding lengths vary greatly.

   E      A       R       P      S     
   0      10      110     1110   1111


