I want to compress many 32bit number using huffman compression.
Each number may appear multiple times, and I know that every number will be replaced with some bit se
Huffman is about compression, and compression requires a "skewed" distribution to work (assuming we are talking about normal, order-0, entropy).
The worst situation regarding Huffman tree depth is when the algorithm creates a degenerated tree, i.e. with only one leaf per level. This situation can happen if the distribution looks like a Fibonacci serie.
Therefore, the worst distribution sequence looks like this : 1, 1, 1, 2, 3, 5, 8, 13, ....
In this case, you fill the full 32-bit tree with only 33 different elements.
Note, however, that to reach a 32 bit-depth with only 33 elements, the most numerous element must appear 3 524 578 times.
Therefore, since suming all Fibonacci numbers get you 5 702 886, you need to compress at least 5 702 887 numbers to start having a risk of not being able to represent them with a 32-bit huffman tree.
That being said, using an Huffman tree to represent 32-bits numbers requires a considerable amount of memory to calculate and maintain the tree.
[Edit] A simpler format, called "logarithm approximation", gives almost the same weight to all symbols. In this case, only the total number of symbols is required.
It computes very fast : say for 300 symbols, you will have some using 8 bits, and others using 9 bits. The formula to decide how many of each type :
9 bits : (300-256)*2 = 44*2 = 88 ; 8 bits : 300 - 88 = 212
Then you can distribute the numbers as you wish (preferably the most frequent ones using 8 bits, but that's not important).
This version scales up to 32 bits, meaning basically no restriction.