Hashtable/Dictionary collisions

后端 未结 5 1168
孤街浪徒
孤街浪徒 2021-01-06 11:39

Using the standard English letters and underscore only, how many characters can be used at a maximum without causing a potential collision in a hashtable/dictionary.

5条回答
  •  無奈伤痛
    2021-01-06 11:50

    Given a perfect hashing function (which you're not typically going to have, as others have mentioned), you can find the maximum possible number of characters that guarantees no two strings will produce a collision, as follows:


    No. of unique hash codes avilable = 2 ^ 32 = 4294967296 (assuming an 32-bit integer is used for hash codes) Size of character set = 2 * 26 + 1 = 53 (26 lower as upper case letters in the Latin alphabet, plus underscore)

    Then you must consider that a string of length l (or less) has a total of 54 ^ l representations. Note that the base is 54 rather than 53 because the string can terminate after any character, adding an extra possibility per char - not that it greatly effects the result.

    Taking the no. of unique hash codes as your maximum number of string representations, you get the following simple equation:

    54 ^ l = 2 ^ 32

    And solving it:

    log2 (54 ^ l) = 32
    l * log2 54 = 32
    l = 32 / log2 54 = 5.56
    

    (Where log2 is the logarithm function of base 2.)

    Since string lengths clearly can't be fractional, you take the integral part to give a maximum length of just 5. Very short indeed, but observe that this restriction would prevent even the remotest chance of a collision given a perfect hash function.


    This is largely theoretical however, as I've mentioned, and I'm not sure of how much use it might be in the design consideration of anything. Saying that, hopefully it should help you understand the matter from a theoretical viewpoint, on top of which you can add the practical considersations (e.g. non-perfect hash functions, non-uniformity of distribution).

提交回复
热议问题