For each of our binary assets we generate a MD5 hash. This is used to check whether a certain binary asset is already in our application. But is it possible that two differe
Yes, it is possible that two different strings can generate the same MD5 hash code.
Here is a simple test using very similar binary message in hex string:
$ echo '4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2' | xxd -r -p | tee >/dev/null >(md5) >(sha1sum)
c6b384c4968b28812b676b49d40c09f8af4ed4cc -
008ee33a9d58b51cfeb425b0959121c9
$ echo '4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2' | xxd -r -p | tee >/dev/null >(md5) >(sha1sum)
c728d8d93091e9c7b87b43d9e33829379231d7ca -
008ee33a9d58b51cfeb425b0959121c9
They generate different SHA-1 sum, but the same MD5 hash value. Secondly the strings are very similar, so it's difficult to find the difference between them.
The difference can be found by the following command:
$ diff -u <(echo 4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2 | fold -w2) <(echo 4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2 | fold -w2)
--- /dev/fd/63 2016-02-05 12:55:04.000000000 +0000
+++ /dev/fd/62 2016-02-05 12:55:04.000000000 +0000
@@ -33,7 +33,7 @@
af
bf
a2
-00
+02
a8
28
4b
@@ -53,7 +53,7 @@
6d
a0
d1
-55
+d5
5d
83
60
Above collision example is taken from Marc Stevens: Single-block collision for MD5, 2012; he explains his method, with source code (alternate link to the paper).
Another test:
$ echo '0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef' | xxd -r -p | tee >/dev/null >(md5) >(sha1sum)
756f3044edf52611a51a8fa7ec8f95e273f21f82 -
cee9a457e790cf20d4bdaa6d69f01e41
$ echo '0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef' | xxd -r -p | tee >/dev/null >(md5) >(sha1sum)
6d5294e385f50c12745a4d901285ddbffd3842cb -
cee9a457e790cf20d4bdaa6d69f01e41
Different SHA-1 sum, the same MD5 hash.
Difference is in one byte:
$ diff -u <(echo 0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef | fold -w2) <(echo 0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef | fold -w2)
--- /dev/fd/63 2016-02-05 12:56:43.000000000 +0000
+++ /dev/fd/62 2016-02-05 12:56:43.000000000 +0000
@@ -19,7 +19,7 @@
03
65
9e
-70
+74
4f
85
34
@@ -41,7 +41,7 @@
a3
f4
15
-5c
+dc
bb
86
07
Above example is adapted from Tao Xie and Dengguo Feng: Construct MD5 Collisions Using Just A Single Block Of Message, 2010.
Related: