Prefix Codes Prefix codes are widely used in applications that compress data, in
ID: 3624461 • Letter: P
Question
Prefix Codes
Prefix codes are widely used in applications that compress data, including JPEG for images and MP3 for music. A binary prefix code is a mapping from some alphabet A (often some subset of the set of ASCII characters) to the set of binary strings. It is most easily represented by a binary tree in which the leaf nodes are labelled with the characters of A. The encoding of a character is determined by the path from the root of the tree to the leaf node that holds the character. A 0 indicates that a left branch is taken, a 1 indicates that a right branch is taken.
As an example, consider the tree in the figure below. It represents a prefix code for the alphabet consisting of characters a, b, c, d, r and !. Note that different characters are encoded with potentially different numbers of bits. For example, the character a is encoded with a single bit, while the character d is encoded with four bits. This is useful when encoding messages in which the character a appears more frequently than the character d. The overall number of bits required to represent the message can be minimised by choosing short encodings for characters that occur frequently, and longer encodings for infrequent ones.
A fundamental property of prefix codes is that messages can be formed by simply stringing together the code bits from left to right. Using the code given above, try to decode the bit string
0111110010110101001111100100
The first 0 must encode a since no other encoding of a character starts with 0. Similarly, the next three 1s must encode b. Then, 110 must encode r, and so on. Altogether, the bit string encodes the message abracadabra!. There is no need for a separator since no encoding of a character is a prefix of another one, which of course is where the term "prefix codes" comes from. To decode a given bit string, simply do the following:
This process is repeated until all the bits in the encoded message are exhausted.
The Assignment
Write a C program that reads in a binary tree representing a prefix code and an encoded message, and that prints out the decoded message. The input is to be read from standard input and consists of a description of the binary tree, followed by a newline character and the encoded message. The description of the tree consists of its preorder traversal, where internal nodes are labeled with the special character ~ (that is assumed not to occur in the alphabet). As an example, the tree and message above together form the input file:
~a~~!~dc~rb
0111110010110101001111100100
The task of writing the program can be subdivided into three parts:
1. Building the tree: Write a (recursive) function makeTree that reads in the preorder traversal and constructs the corresponding tree. Represent the tree using the standard binary tree data structure:
2. Traversing the tree: Write a function traverseTree that traverses the binary tree. It prints out a table of the characters in the leaf nodes of the tree and the respective lengths (numbers of bits) of their encodings. For the example above, traverseTree should produce the following output (written to standard error):
Assume that the encoded characters may include the tab and newline characters. In the table you print out, those should be treated as special cases and represented by and , respectively.
3. Decoding the message: Write a function decodeMessage that reads the encoded message from standard input, and writes the decoded message to standard output. It should also print to standard error the number of bits read in, the number of characters in the original message, and the compression factor. For the above example, decodeMessage prints abracadabra! to standard output. The original message contains 12 characters which would normally require 96 bits of storage (8 bits per character). The encoded message uses only 28 bits, or 29.2% of the space required without the encoding. The output written to standard error should therefore be
In general, the compression factor depends on the frequency of the characters in the message. Note that for large messages, the amount of space needed to store the description of the tree is negligible compared to storing the message itself, so we have ignored this quantity in the calculation. Also, for simplicity, the encoded message is a sequence of the characters 0 and 1. In order to be useful, these bits would of course need to be packed eight to the byte. You may ignore that for this assignment.
Explanation / Answer
Please give me the points I really need them. I appreciate :)