Math 3614, Homework 3

Instructions: The following problems are due on Monday 11/18/96 at the beginning of class. You may discuss the problems with other students in the class, but any work you turn in must be your own.

Introduction to Huffman Codes

Huffman codes, which represent characters by variale-length bit strings, provide alternatives to ASCII and other fixed-length codes. The idea is to use short bit strings to represent the most frequently used characters and to use longer bit strings to represent less frequently used characters. In this way, it is possible to represent strings of characters in less space than if ASCII were used. A Huffman code is an example of a prefix code, which is described on pages 551 and 552 of the text. What makes a Huffman code special is that the characters are encoded based on the frequencies that the various characters occur. In short, high frequency characters get short codes, while low frequency characters get short codes. The following is a description of how to construct a tree representing a Huffman code:
Input:  A list of n characters c(i) and their frequencies f(i), i=1...n.

Output: A rooted tree that defines an optimal Huffman code corresponding
to those frequencies.

INITIALIZE: Define a set of n trees T(i) each consisting of a single node
labeled by c(i):

           T(i)  =           * 
                            c(i)

WHILE n>1
  BEGIN
    + Choose the two smallest frequencies f(i) and f(j) on the list
    + Construct an new tree T(n+1) as follows


          T(n+1) =            *
                           0 / \ 1
                            /   \
                         T(i)   T(j)


          (Note that T(i) and T(j) are trees themselves.)

<<<<     NOTE: FOR PURPOSES OF THIS ASSIGNMENT, ALWAYS PUT THE SUBTREE  >>>>
<<<<     WITH THE SMALLEST ASSOCIATED FREQUENCY ON THE LEFT!!!          >>>>


     + Compute a new frequency f(n+1)=f(i)+f(j) to represent the frequency
       of the newly constructed tree.
     + Remove T(i) and T(j) from the list of trees. And remove f(i) and f(j)
       from the list of frequencies.  Thus, there will be a total of n-1 
       trees and n-1 frequencies.
     + Set n=n-1
  END
As an example of the above procedure, consider the following frequency table:
      Letter       Frequency
i      c(i)           f(i)
--    ------       ---------
1        A              8
2        B              2
3        C              3
4        D              2
On the first pass through the WHILE loop, frequencies i=2 and j=4 are the smallest, so we construct the tree
          T(n+1) =           *
                          0 / \ 1
                           /   \
                          *     *
                         B       D
      Letter       Frequency
i      c(i)           f(i)
--    ------       ---------
1       A              8
2       C              3
3      B,D             4
On the next pass, frequencies i=2 and j=3 are the smallest, so we construct the tree
          T(n+1) =           *
                          0 / \ 1
                           /   \
                          *     *
                         C   0 / \ 1
                              /   \
                             *     *
                            B       D
The revised frequency table is then
      Letter       Frequency
i      c(i)           f(i)
--    ------       ---------
1       A              8
2       B,C,D          7
On the last pass, we choose i=2 (the smallest freq) and j=1(the 2nd smallest), and construct the tree
          T(n+1) = 
                                *
                             0 / \ 1
                              /   \
                             *     *
                          0 / \ 1   A
                           /   \
                          *     *
                         C   0 / \ 1
                              /   \
                             *     *
                            B       D
We can then read this tree to determine the encoding scheme:
   A:1, C:00, B:010, D:011
To make sure you understand this procedure correctly, construct the Huffman code for the characters A,B,C,D,E with the following frequency table:
Character        Frequency
----------       ---------
    A                6
    B                3
    C                4
    D                3
    E                8
You should get the following encoding scheme:
   A:10, E:11, C:00, B:010, D:011

Problems

  1. Construct the tree representing the Huffman code for the following frequency table (by hand):
    
    Letter    Frequency
    ------    ---------
      A          20
      B           6
      C          25
      E          30
      L           5
      N          10
      O          15
      R           4
      T          22
      V           2
      X           1
    
  2. Based on the Huffman code you constructed above, encode each of the following words (by hand):
    1. ABLE (Your answer should be 1010001000001)
    2. CELLO
    3. EXCEL
  3. Programming Assignment:(See the handout Guidelines for programming assignments for instructions concerning programming assignments)

    Write a computer program that stores the tree you constructed above, and uses it to decode bit-strings according to the corresponding Huffman code. Your program should accept as input a sequence of bit strings, and return as output the decoded word for each bit string. A typical run of your program should look something like:
    Enter a bit string:  Type -1 to quit.
    >>1010001000001
    The decoded word is: ABLE
    
    Enter a bit string:  Type -1 to quit.
    >>1
    ERROR--The bit string is not valid
    
    Enter a bit string:  Type -1 to quit.
    >>-1
    
    Test your code with the following bit strings:
    1. The three bit strings from your answer to problem 2. (Note: if you do not get answers of ABLE, CELLO, and EXCEL, then there is a bug in your program).
    2. 1101010001000001 (This should decode to the word TABLE).
    3. 011000001110100000000011001110
    4. 000110001101100001001
    (Note: if you do not get recognizable words for these last three bit strings, then there is probably something wrong with the tree you constructed in problem 1).
For hints on how to do the programming assignment click here.