What's the shortest pair of strings that causes an MD5 collision?

后端 未结 3 1131
名媛妹妹
名媛妹妹 2020-12-12 16:01

Up to what string length is it possible to use MD5 as a hash without having to worry about the possibility of a collision?

This would presumably be calculated by gen

3条回答
  •  伪装坚强ぢ
    2020-12-12 16:56

    The mathematics of the birthday paradox make the inflection point of probability of collision roughly around sqrt(N), where N is the number of distinct bins in the hash function, so for a 128-bit hash, as you get around 64 bits you are moderately likely to have 1 collision. So my guess is for the complete set of 8 byte strings it's somewhat likely to have a collision, and for 9 byte strings it's extremely likely.

    edit: this assumes that the MD5 hash algorithm causes a mapping from input bytestring to output hash that is close to "random". (vs. one that distributes strings more evenly among the set of possible hashes, in which case it would be more close to 16 bytes.)

    Also for a more specific numerical answer, if you look at one of the approximations for calculating collision probability, you get

    p(k) ≈ 1 - e-k(k-1)/(2*2128) where k = the size of the space of possible inputs = 2m where the input bytestring is m bits long.

    the set of 8 byte strings: p(264) ≈ 1 - e-0.5 ≈ 0.3935

    the set of 9 byte strings: p(272) ≈ 1 - e-2144/(2*2128) = 1 - e-215 = 1 - e-32768 ≈ 1

    Also note that these assume the complete set of m/8 byte strings. If you only use alphanumeric characters, you'd need more bytes to get a probable collision.

提交回复
热议问题