How do I assess the hash collision probability?

前端 未结 5 1330
情歌与酒
情歌与酒 2020-11-27 03:02

I\'m developing a back-end application for a search system. The search system copies files to a temporary directory and gives them random names. Then it passes the temporary

5条回答
  •  旧时难觅i
    2020-11-27 03:55

    Equal hash means equal file, unless someone malicious is messing around with your files and injecting collisions. (this could be the case if they are downloading stuff from the internet) If that is the case go for a SHA2 based function.

    There are no accidental MD5 collisions, 1,47x10-29 is a really really really small number.

    To overcome the issue of rehashing big files I would have a 3 phased identity scheme.

    1. Filesize alone
    2. Filesize + a hash of 64K * 4 in different positions in the file
    3. A full hash

    So if you see a file with a new size you know for certain you do not have a duplicate. And so on.

提交回复
热议问题