md5 hash collisions.

落花浮王杯 提交于 2020-01-12 14:26:35

问题


If counting from 1 to X, where X is the first number to have an md5 collision with a previous number, what number is X?

I want to know if I'm using md5 for serial numbers, how many units I can expect to be able to enumerate before I get a collision.


回答1:


Theoretically, you can expect collisions for X around 264. For a hash function with an output of n bits, first collisions appear when you have accumulated about 2n/2 outputs (it does not matter how you choose the inputs; sequential integer values are nothing special in that respect).

Of course, MD5 has been shown not to be a good hash function. Also, the 2n/2 is only an average. So, why don't you try it ? Take a MD5 implementation, hash your serial numbers, and see if you get a collision. A basic MD5 implementation should be able to hash a few million values per second, and, with a reasonable hard disk, you could accumulate a few billions of outputs, sort them, and see if there is a collision.




回答2:


I can't answer your question, but what you are looking for is a uuid. UUID serial numbers can be unique for millions of products, but you might need to check a database to mitigate the tiny chance of a collision.




回答3:


I believe no one has done some test on this

Considering that if you have a simple incremental number you don't need to hash it




回答4:


As far as i know there are no known collisions in md5 for 2^32 (size of an integer)




回答5:


It really depends on the size of your input. A perfect hash function has collisions every (input_length / hash_length) hashes. If your input is small collisions are fairly unlikely, so far there has only been a single one-block collision.




回答6:


I realize this is an old question but I stumbled upon it, found a much better approach, and figured I'd share it.

You have an upper boundary for your ordinal number N so let's take advantage of that. Let's say N < 232 ≈ 4.3*1010. Now each time you need a new identifier you just pick a random 32-bit number R and concatenate it with R xor N (zero-pad before concatenation). This yields a random looking unique 64-bit identifier which you could denote with just 16 hexadecimal digits.

This approach prevents collisions completely because two identifiers that happen to have the same random component necessarily have distinct xor-ed components.

Bonus feature: you can split such a 64-bit identifier into two 32-bit numbers and xor them with each other to recover the original ordinal number.



来源:https://stackoverflow.com/questions/6885667/md5-hash-collisions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!