For my assignment, I am to do a encode and decode for huffman trees. I have a problem creating my tree, and I am stuck.
Don\'t mind the print statements - they are just
The Huffman algorithm in Wikipedia tells you exactly how to create the node tree, so your program can be based on that algorithm, or another like it. Here is a Python program with comments showing the corresponding wikipedia algorithm step. The test data is frequencies of the letters of the alphabet in English text.
Once the node tree is created, you need to walk it down to assign Huffman codes to each symbol in your dataset. Since this is homework, that step is up to you, but a recursive algorithm is the simplest and most natural way to handle it. It's only six more lines of code.
import queue
class HuffmanNode(object):
def __init__(self, left=None, right=None, root=None):
self.left = left
self.right = right
self.root = root # Why? Not needed for anything.
def children(self):
return((self.left, self.right))
freq = [
(8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
(12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
(6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
(2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'),
(0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'),
(2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
(1.974, 'y'), (0.074, 'z') ]
def create_tree(frequencies):
p = queue.PriorityQueue()
for value in frequencies: # 1. Create a leaf node for each symbol
p.put(value) # and add it to the priority queue
while p.qsize() > 1: # 2. While there is more than one node
l, r = p.get(), p.get() # 2a. remove two highest nodes
node = HuffmanNode(l, r) # 2b. create internal node with children
p.put((l[0]+r[0], node)) # 2c. add new node to queue
return p.get() # 3. tree is complete - return root node
node = create_tree(freq)
print(node)
# Recursively walk the tree down to the leaves,
# assigning a code value to each symbol
def walk_tree(node, prefix="", code={}):
return(code)
code = walk_tree(node)
for i in sorted(freq, reverse=True):
print(i[1], '{:6.2f}'.format(i[0]), code[i[1]])
When run on the alphabet data, the resulting Huffman codes are:
e 12.70 100
t 9.06 000
a 8.17 1110
o 7.51 1101
i 6.97 1011
n 6.75 1010
s 6.33 0111
h 6.09 0110
r 5.99 0101
d 4.25 11111
l 4.03 11110
c 2.78 01001
u 2.76 01000
m 2.41 00111
w 2.37 00110
f 2.23 00100
g 2.02 110011
y 1.97 110010
p 1.93 110001
b 1.49 110000
v 1.04 001010
k 0.75 0010111
j 0.15 001011011
x 0.15 001011010
q 0.10 001011001
z 0.07 001011000