I have an information retrieval application that creates bit arrays on the order of 10s of million bits. The number of \"set\" bits in the array varies widely, from all clear to
One more compression thought:
If the bit array is not crazy long, you could try applying the Burrows-Wheeler transform before using any repetition encoding, such as Huffman. A naive implementation would take O(n^2) memory during (de)compression and O(n^2 log n) time to decompress - there are almost certainly shortcuts to be had, as well. But if there's any sequential structure to your data at all, this should really help the Huffman encoding out.
You could also apply that idea to one block at a time to keep the time/memory usage more practical. Using one block at time could allow you to always keep most of the data structure compressed if you're reading/writing sequentially.