How does a hash table work?

后端 未结 15 2288
盖世英雄少女心
盖世英雄少女心 2020-11-22 09:32

I\'m looking for an explanation of how a hash table works - in plain English for a simpleton like me!

For example, I know it takes the key, calculates the hash (I a

15条回答
  •  没有蜡笔的小新
    2020-11-22 10:10

    You guys are very close to explaining this fully, but missing a couple things. The hashtable is just an array. The array itself will contain something in each slot. At a minimum you will store the hashvalue or the value itself in this slot. In addition to this you could also store a linked/chained list of values that have collided on this slot, or you could use the open addressing method. You can also store a pointer or pointers to other data you want to retrieve out of this slot.

    It's important to note that the hashvalue itself generally does not indicate the slot into which to place the value. For example, a hashvalue might be a negative integer value. Obviously a negative number cannot point to an array location. Additionally, hash values will tend to many times be larger numbers than the slots available. Thus another calculation needs to be performed by the hashtable itself to figure out which slot the value should go into. This is done with a modulus math operation like:

    uint slotIndex = hashValue % hashTableSize;
    

    This value is the slot the value will go into. In open addressing, if the slot is already filled with another hashvalue and/or other data, the modulus operation will be run once again to find the next slot:

    slotIndex = (remainder + 1) % hashTableSize;
    

    I suppose there may be other more advanced methods for determining slot index, but this is the common one I've seen... would be interested in any others that perform better.

    With the modulus method, if you have a table of say size 1000, any hashvalue that is between 1 and 1000 will go into the corresponding slot. Any Negative values, and any values greater than 1000 will be potentially colliding slot values. The chances of that happening depend both on your hashing method, as well as how many total items you add to the hash table. Generally, it's best practice to make the size of the hashtable such that the total number of values added to it is only equal to about 70% of its size. If your hash function does a good job of even distribution, you will generally encounter very few to no bucket/slot collisions and it will perform very quickly for both lookup and write operations. If the total number of values to add is not known in advance, make a good guesstimate using whatever means, and then resize your hashtable once the number of elements added to it reaches 70% of capacity.

    I hope this has helped.

    PS - In C# the GetHashCode() method is pretty slow and results in actual value collisions under a lot of conditions I've tested. For some real fun, build your own hashfunction and try to get it to NEVER collide on the specific data you are hashing, run faster than GetHashCode, and have a fairly even distribution. I've done this using long instead of int size hashcode values and it's worked quite well on up to 32 million entires hashvalues in the hashtable with 0 collisions. Unfortunately I can't share the code as it belongs to my employer... but I can reveal it is possible for certain data domains. When you can achieve this, the hashtable is VERY fast. :)

提交回复
热议问题