Maximum number of different numbers, Huffman Compression

前端 未结 2 2087
清歌不尽
清歌不尽 2020-12-20 10:55

I want to compress many 32bit number using huffman compression.

Each number may appear multiple times, and I know that every number will be replaced with some bit se

2条回答
  •  情书的邮戳
    2020-12-20 11:01

    Huffman is about compression, and compression requires a "skewed" distribution to work (assuming we are talking about normal, order-0, entropy).

    The worst situation regarding Huffman tree depth is when the algorithm creates a degenerated tree, i.e. with only one leaf per level. This situation can happen if the distribution looks like a Fibonacci serie.

    Therefore, the worst distribution sequence looks like this : 1, 1, 1, 2, 3, 5, 8, 13, ....

    In this case, you fill the full 32-bit tree with only 33 different elements.

    Note, however, that to reach a 32 bit-depth with only 33 elements, the most numerous element must appear 3 524 578 times.

    Therefore, since suming all Fibonacci numbers get you 5 702 886, you need to compress at least 5 702 887 numbers to start having a risk of not being able to represent them with a 32-bit huffman tree.

    That being said, using an Huffman tree to represent 32-bits numbers requires a considerable amount of memory to calculate and maintain the tree.

    [Edit] A simpler format, called "logarithm approximation", gives almost the same weight to all symbols. In this case, only the total number of symbols is required.

    It computes very fast : say for 300 symbols, you will have some using 8 bits, and others using 9 bits. The formula to decide how many of each type :

    9 bits : (300-256)*2 = 44*2 = 88 ; 8 bits : 300 - 88 = 212

    Then you can distribute the numbers as you wish (preferably the most frequent ones using 8 bits, but that's not important).

    This version scales up to 32 bits, meaning basically no restriction.

提交回复
热议问题