发表新帖

发表新帖

What is the fastest hash algorithm to check if two files are equal?

后端未结

关注

 12  1581

野性不改 2020-12-07 10:15

What is the fastest way to create a hash function which will be used to check if two files are equal?

Security is not very important.

Edit: I am sending a fi

12条回答

谎友^ (楼主)

2020-12-07 10:57

What we are optimizing here is time spent on a task. Unfortunately we do not know enough about the task at hand to know what the optimal solution should be.

Is it for one-time comparison of 2 arbitrary files? Then compare size, and after that simply compare the files, byte by byte (or mb by mb) if that's better for your IO.

If it is for 2 large sets of files, or many sets of files, and it is not a one-time exercise. but something that will happen frequently, then one should store hashes for each file. A hash is never unique, but a hash with a number of say 9 digits (32 bits) would be good for about 4 billion combination, and a 64 bit number would be good enough to distinguish between some 16 * 10^18 Quintillion different files.

A decent compromise would be to generate 2 32-bit hashes for each file, one for first 8k, another for 1MB+8k, slap them together as a single 64 bit number. Cataloging all existing files into a DB should be fairly quick, and looking up a candidate file against this DB should also be very quick. Once there is a match, the only way to determine if they are the same is to compare the whole files.

I am a believer in giving people what they need, which is not always never what they think they need, or what the want.

0 讨论(0)

查看其它12个回答
发布评论:

提交评论
- 加载中...

热议问题