I want to know the probability of getting duplicate values when calling the GetHashCode() method on string instances. For instance, according to th
Just in case your question is meant to be what is the probability of a collision in a group of strings,
For n available slots and m occupying items:
Prob. of no collision on first insertion is 1.
Prob. of no collision on 2nd insertion is ( n - 1 ) / n
Prob. of no collision on 3rd insertion is ( n - 2 ) / n
Prob. of no collision on mth insertion is ( n - ( m - 1 ) ) / n
The probability of no collision after m insertions is the product of the above values: (n - 1)!/((n - m)! * n^(m - 1)).
which simplifies to ( n choose k ) / ( n^m ).
And everybody is right, you can't assume 0 collisions, so, saying the probability is "low" may be true but doesn't allow you to assume that there will be no collisions. If you're looking at a hashtable, I think the standard is you begin to have trouble with significant collisions when you're hashtable is about 2/3rds full.