Huffman trees for non-binary alphabets?

巧了我就是萌 提交于 2019-12-06 04:10:35

问题


Is there an easy generalization of Huffman coding trees for situations where the resulting alphabet is not binary? For instance, if I wanted to compress some text by writing it out in ternary, I could still build up a prefix-free coding system for each character I as writing out. Would the straightforward generalization of the Huffman construction (using a k-ary tree rather than a binary tree) still work correctly and efficiently? Or does this construction lead to a highly inefficient coding scheme?


回答1:


The algorithm still works and it's still simple — in fact Wikipedia has a brief reference to n-ary Huffman coding citing the original Huffman paper as a source.

It does occur to me, though, that just as Huffman is slightly suboptimal because it allocates an integer number of bits to each symbol (unlike e.g. Arithmetic coding), ternary Huffman should be a little bit more suboptimal because it has to allocate an integer number of trits. Not a show-stopper, especially for only 3, but it does indicate that n-ary Huffman will fall further behind other coding algorithms as you increase n.




回答2:


As an empirical test, I constructed binary and trinary Huffman trees for the distribution of Scrabble tiles.

The entropy of the distribution shows you can't get better than 4.37 bits per letter.

The binary Huffman tree uses on average 4.41 bits per letter.

The trinary Huffman tree uses on average 2.81 trits per letter, which has the same information density as 4.45 bits per letter.



来源:https://stackoverflow.com/questions/5452522/huffman-trees-for-non-binary-alphabets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!