Why is the internal data of BitSet in java stored as long[] instead of int[] in Java?

问题

In java, the internal data of BitSet is stored as long[] instead of int[], I want to know why? Here is the code in jdk:

 /**
 * The internal field corresponding to the serialField "bits".
 */
 private long[] words;

If it's all about performance, I wonder why long[] storage will get better performance.

回答1:

When querying or manipulating a single bit, there is no significant difference. You have to calculate the word index and read that word and, in case of an update, manipulate one bit of that word and write it back. That’s all the same for int[] and long[].

One could argue that doing it using a long instead of int could raise the amount of memory that has to be transferred for a single bit operation if you have a real 32 bit memory bus, but since Java was designed in the nineties of the last century, the designers decided that this is not an issue anymore.

On the other hand, you get a big win when processing multiple bits at once. When you perform operations like and, or or xor on an entire BitSet, you can perform the operation on an entire word, read 64 bits, at once when using a long array.

Similarly, when searching for the next set bit, if the bit is not within the word of the start position, subsequent words are first tested against zero, which is an intrinsic operation, even for most 32 bit CPUs, so you can skip 64 zero bits at once while the first non-zero word will definitely contain the next set bit, so only one bit extraction operation is needed for the entire iteration.

These benefits for bulk operations will outweigh any single-bit related drawbacks, if there ever are one. As said, most today’s CPU are capable of doing all operations on 64 bit words directly.

回答2:

On 64-bit machines performing bitwise operations on single long value are significantly more performant than the same operations on two int values as 64-bit values are directly supported by hardware. On 32-bit machines the difference is probably not very significant.

回答3:

Based on cursory reading of the source here. Seems like, the main cause is purely for performance. This is the comment retrieved from the source.

BitSets are packed into arrays of "words." Currently a word is a long, which consists of 64 bits, requiring 6 address bits. The choice of word size is determined purely by performance concerns.

回答4:

Surely is an optimization issue: A single long value stores up to 64 bits, and int only 32. So, any user length under 64 needs only one entry in the array. If it was an array of int, it would have need two entries, which is slower and heavier to maintain.

回答5:

I might be wrong but with using long[] the cardinality of bitSet is much bigger than when using the int[]. Because the max size of array is quite similar for both of them (yet limited to heap size).

来源：https://stackoverflow.com/questions/32110554/why-is-the-internal-data-of-bitset-in-java-stored-as-long-instead-of-int-in

标签

java

performance

bitset