问题
What Initial Capacity should I use for a HashSet into which I know that I am going to insert 1000 integers to prevent the need for any internal rebuilds ?
At first I though that I should use 1000 but reading the description of the constructor that taks the initialCapacity parameter it says Constructs a new, empty set; the backing HashMap instance has the specified initial capacity and default load factor (0.75).
.
So If I set capacity to 1000 the hashMap will resize when reaching 750 elements?
Also I assume that some "space" is required for the effectiveness of the hashMap so solving IC*0.75=1000 to get something like 1334 might also not be the best solution or is it?
UPDATE:
1) I am aware that the implication of internal re-size is not a significant one but still its a chance to learn and better understand the environment which I am using. and the effort should be minimal.
2) Several comments where made regarding the choice of data structure. Please have a look at my previous Q here: Data structure recommendation where more exact information is provided about my scenario.
回答1:
You need a size/load-factor
to avoid a resize. Note: it will always be the next power of 2 for HashSet & HashMap.
回答2:
For your case, it is reasonable to set the initial capacity to 1000 and the load factor to 1 as two different Integer
s will not share the same hash (which is the int itself).
Nevertheless, for general purpose you should not really care about the load factor and leave it as it is as you will probably never notice any improvement setting it yourself. Increasing the load factor may actually lead to dramatic decrease in performance.
回答3:
If it's really worth worrying about this (and I suspect it's not - resizing a set of 1000 integers won't take long), then bear in mind that HashSet
is backed by a HashMap
and the put
method references this:
addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
It's always worth checking out the source code for such queries, although bear in mind the implementation may always change (even for minor JRE releases).
Finally, is a set appropriate for this scenario ? If you have a fixed size of integer allocation, perhaps a simple array (using primitives and thus avoiding boxing) would be faster/simpler ?
回答4:
I think, the ideal initial capacity would be to keep it to the number of integers you want to insert, and the load factor be left to the default.
go for <# of integers>/0.75 load factor.
来源:https://stackoverflow.com/questions/18308987/initial-capacity-for-a-hashsetinteger