Java and Hash algorithm to compare files [closed]

怎甘沉沦 提交于 2019-11-29 21:35:15

问题


I have to fingerprint files to match doublets. What is recommended with Java in 2013? Should I also compare the file size, or is this a unnecessary check?

The probability of false positive should be very close to 0

EDIT: Lots of answers, thanks. What is the standard of backup software today? SHA-256? higher? I guess md5 is not suitable?


回答1:


If the probability of false positives has to be zero, as opposed to "lower than the probability you will be struck by lightning," then no hash algorithm at all can be used; you must compare the files byte by byte.

For what it's worth, if you can use third-party libraries, you can use Guava to compare two files byte-by-byte with the one-liner

Files.asByteSource(file1).contentEquals(Files.asByteSource(file2));

which takes care of opening and closing the files as well as the details of comparison.

If you're willing to accept false positives that are less likely than getting struck by lightning, then you could do

Files.hash(file, Hashing.sha1()); // or md5(), or sha256(), or...

which returns a HashCode, and then you can test that for equality with the hash of another file. (That version also deals with the messiness of MessageDigest, of opening and closing the file properly, etcetera.)




回答2:


Are you asking how to getting the md5 checksums of files in Java? If that's the case then read the accepted answers here and here. Basically, do this:

import java.security.DigestInputStream;
...
...

MessageDigest md_1 = MessageDigest.getInstance("MD5");
MessageDigest md_2 = MessageDigest.getInstance("MD5");
InputStream is_1 = new FileInputStream("file1.txt");
InputStream is_2 = new FileInputStream("file2.txt");
try {
  is_1 = new DigestInputStream(is_1, md_1);
  is_2 = new DigestInputStream(is_2, md_2);
}
finally {
  is_1.close();
  is_2.close();
}
byte[] digest_1 = md_1.digest();
byte[] digest_2 = md_2.digest();

// compare digest_1 and digest_2

Should I also compare the file size, or is this a unnecessary check?

It is unnecessary.



来源:https://stackoverflow.com/questions/15441315/java-and-hash-algorithm-to-compare-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!