For each of our binary assets we generate a MD5 hash. This is used to check whether a certain binary asset is already in our application. But is it possible that two differe
I think we need to be careful choosing the hashing algorithm as per our requirement, as hash collisions are not as rare as I expected. I recently found a very simple case of hash collision in my project. I am using Python wrapper of xxhash for hashing. Link: https://github.com/ewencp/pyhashxx
s1 = 'mdsAnalysisResult105588'
s2 = 'mdsAlertCompleteResult360224'
pyhashxx.hashxx(s1) # Out: 2535747266
pyhashxx.hashxx(s2) # Out: 2535747266
It caused a very tricky caching issue in the system, then I finally found that it's a hash collision.