What is the fastest hash algorithm to check if two files are equal?

后端 未结 12 1581
野性不改
野性不改 2020-12-07 10:15

What is the fastest way to create a hash function which will be used to check if two files are equal?

Security is not very important.

Edit: I am sending a fi

12条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-07 10:57

    What we are optimizing here is time spent on a task. Unfortunately we do not know enough about the task at hand to know what the optimal solution should be.

    Is it for one-time comparison of 2 arbitrary files? Then compare size, and after that simply compare the files, byte by byte (or mb by mb) if that's better for your IO.

    If it is for 2 large sets of files, or many sets of files, and it is not a one-time exercise. but something that will happen frequently, then one should store hashes for each file. A hash is never unique, but a hash with a number of say 9 digits (32 bits) would be good for about 4 billion combination, and a 64 bit number would be good enough to distinguish between some 16 * 10^18 Quintillion different files.

    A decent compromise would be to generate 2 32-bit hashes for each file, one for first 8k, another for 1MB+8k, slap them together as a single 64 bit number. Cataloging all existing files into a DB should be fairly quick, and looking up a candidate file against this DB should also be very quick. Once there is a match, the only way to determine if they are the same is to compare the whole files.

    I am a believer in giving people what they need, which is not always never what they think they need, or what the want.

提交回复
热议问题