What load factor should be used when you know maximum possible no of elements in HashSet

孤街醉人 提交于 2019-12-10 04:17:29

问题


What load factor should I use when I really know the maximum possible no of elements in a HashSet ? I had heard that the default load factor of 0.75 is recommended as it offers good performance trade-offs between speed & space. Is this correct ? However a larger size HashSet would also takes more time in creation and more space.

I am using HashSet just inorder to remove duplicate integers from a list of integers.


回答1:


I spent some time playing around with load factors once, and it is shocking how little difference that setting really makes in practice. Even setting it to something high like 2.0 doesn't slow things down much, nor does it save that much memory. Just pretend it doesn't exist. Josh has often regretted ever exposing it as an option at all.




回答2:


For your stated problem, instead of using a HashSet, you might also consider a BitSet

Depending on the range and sparsity of your integers, you may get better performance and space characteristics.




回答3:


It depends on your integers a lot. The point of the load factor is to "balance" the hash function: with a "perfect" hash function, your load factor could be 1.0. However, if the integer values in question show any sort of regularity, this may result in more than average hash collisions, which decrease the efficiency of the map. Then a lower load factor may help to spread the values a bit better (over a larger range), thus reduce hash collisions.

I wouldn't worry much about the creation time and extra space taken by using a lower load factor - I doubt you would ever notice the difference (unless you are on a platform with limited hardware, or have several millions of integers in your map - then the size difference may become noticeable, roughly on the range of a few extra megabytes per million values).




回答4:


If you know EXACTLY how many you should have, you should put the load factor at 1 and ensure your hash function maps 1:1. You may want to extend your container to not rehash your hash.

Note that this sort of 'exact' thing tends to change over time, so you might be best off with just your normal container. :)

EDIT: My answer was before I knew it was integers.

Yeah, your best bet is to just leave as it is. You'll never notice the difference.

/**
 * Remove duplicates from a list. 
 * @note This will ALTER the list. 
 * @note This is not thread safe.
 * @param the list (potentially with duplicates)
 */
void removeDuplicates(List<Integer> list) {
    Set<Integer> noDupe = new HashSet<Integer>(list.size()); // will end up resizing once, oh well
    for(Integer i : list) noDupe.add(i);
    list.clear();
    list.addAll(noDupe);
}


来源:https://stackoverflow.com/questions/5291780/what-load-factor-should-be-used-when-you-know-maximum-possible-no-of-elements-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!