Math 3614, Homework 3
Instructions: The following problems are due
on Monday 11/18/96 at the beginning of class. You may
discuss the problems with other students in the class, but any
work you turn in must be your own.
Introduction to Huffman Codes
Huffman codes, which represent characters by variale-length bit strings,
provide alternatives to ASCII and other fixed-length codes. The idea
is to use short bit strings to represent the most frequently used
characters and to use longer bit strings to represent less frequently
used characters. In this way, it is possible to represent strings of
characters in less space than if ASCII were used.
A Huffman code is an example of a prefix code, which is described on pages
551 and 552 of the text. What makes a Huffman code special is that the
characters are encoded based on the frequencies that the various characters
occur. In short, high frequency characters get short codes, while low
frequency characters get short codes.
The following is a description of how to construct a tree representing
a Huffman code:
Input: A list of n characters c(i) and their frequencies f(i), i=1...n.
Output: A rooted tree that defines an optimal Huffman code corresponding
to those frequencies.
INITIALIZE: Define a set of n trees T(i) each consisting of a single node
labeled by c(i):
T(i) = *
c(i)
WHILE n>1
BEGIN
+ Choose the two smallest frequencies f(i) and f(j) on the list
+ Construct an new tree T(n+1) as follows
T(n+1) = *
0 / \ 1
/ \
T(i) T(j)
(Note that T(i) and T(j) are trees themselves.)
<<<< NOTE: FOR PURPOSES OF THIS ASSIGNMENT, ALWAYS PUT THE SUBTREE >>>>
<<<< WITH THE SMALLEST ASSOCIATED FREQUENCY ON THE LEFT!!! >>>>
+ Compute a new frequency f(n+1)=f(i)+f(j) to represent the frequency
of the newly constructed tree.
+ Remove T(i) and T(j) from the list of trees. And remove f(i) and f(j)
from the list of frequencies. Thus, there will be a total of n-1
trees and n-1 frequencies.
+ Set n=n-1
END
As an example of the above procedure, consider the following frequency
table:
Letter Frequency
i c(i) f(i)
-- ------ ---------
1 A 8
2 B 2
3 C 3
4 D 2
On the first pass through the WHILE loop, frequencies i=2 and j=4 are
the smallest, so we construct the tree
T(n+1) = *
0 / \ 1
/ \
* *
B D
Letter Frequency
i c(i) f(i)
-- ------ ---------
1 A 8
2 C 3
3 B,D 4
On the next pass, frequencies i=2 and j=3 are the smallest, so we construct
the tree
T(n+1) = *
0 / \ 1
/ \
* *
C 0 / \ 1
/ \
* *
B D
The revised frequency table is then
Letter Frequency
i c(i) f(i)
-- ------ ---------
1 A 8
2 B,C,D 7
On the last pass, we choose i=2 (the smallest freq) and j=1(the 2nd smallest),
and construct the tree
T(n+1) =
*
0 / \ 1
/ \
* *
0 / \ 1 A
/ \
* *
C 0 / \ 1
/ \
* *
B D
We can then read this tree to determine the encoding scheme:
A:1, C:00, B:010, D:011
To make sure you understand this procedure correctly, construct the Huffman
code for the characters A,B,C,D,E with the following frequency table:
Character Frequency
---------- ---------
A 6
B 3
C 4
D 3
E 8
You should get the following encoding scheme:
A:10, E:11, C:00, B:010, D:011
Problems
- Construct the tree representing the Huffman code for the
following frequency table (by hand):
Letter Frequency
------ ---------
A 20
B 6
C 25
E 30
L 5
N 10
O 15
R 4
T 22
V 2
X 1
- Based on the Huffman code you constructed above, encode each of the
following words (by hand):
- ABLE (Your answer should be 1010001000001)
- CELLO
- EXCEL
- Programming Assignment:(See the handout
Guidelines for programming assignments for instructions concerning
programming assignments)
Write a computer program that stores the tree you constructed above, and
uses it to decode bit-strings according to the corresponding Huffman code.
Your program should accept as input a sequence of bit strings, and return
as output the decoded word for each bit string. A typical run of your
program should look something like:
Enter a bit string: Type -1 to quit.
>>1010001000001
The decoded word is: ABLE
Enter a bit string: Type -1 to quit.
>>1
ERROR--The bit string is not valid
Enter a bit string: Type -1 to quit.
>>-1
Test your code with the following bit strings:
- The three bit strings from your answer to problem 2. (Note: if
you do not get answers of ABLE, CELLO, and EXCEL, then there is a bug
in your program).
- 1101010001000001 (This should decode to the word TABLE).
- 011000001110100000000011001110
- 000110001101100001001
(Note: if you do not get recognizable words for these last three
bit strings, then there is probably something wrong with the tree you
constructed in problem 1).
For hints on how to do the programming assignment click
here.