Probability of getting a duplicate value when calling GetHashCode() on strings

前端 未结 6 1547
遥遥无期
遥遥无期 2020-11-30 10:49

I want to know the probability of getting duplicate values when calling the GetHashCode() method on string instances. For instance, according to th

6条回答
  •  伪装坚强ぢ
    2020-11-30 11:29

    Just in case your question is meant to be what is the probability of a collision in a group of strings,

    For n available slots and m occupying items:
    Prob. of no collision on first insertion is 1.
    Prob. of no collision on 2nd insertion is ( n - 1 ) / n
    Prob. of no collision on 3rd insertion is ( n - 2 ) / n
    Prob. of no collision on mth insertion is ( n - ( m - 1 ) ) / n

    The probability of no collision after m insertions is the product of the above values: (n - 1)!/((n - m)! * n^(m - 1)).

    which simplifies to ( n choose k ) / ( n^m ).

    And everybody is right, you can't assume 0 collisions, so, saying the probability is "low" may be true but doesn't allow you to assume that there will be no collisions. If you're looking at a hashtable, I think the standard is you begin to have trouble with significant collisions when you're hashtable is about 2/3rds full.

提交回复
热议问题