Best way to detect duplicate uploaded files in a Java Environment?

五迷三道 提交于 2019-12-04 10:05:51

问题


As part of a Java based web app, I'm going to be accepting uploaded .xls & .csv (and possibly other types of) files. Each file will be uniquely renamed with a combination of parameters and a timestamp.

I'd like to be able to identify any duplicate files. By duplicate I mean, the exact same file regardless of the name. Ideally, I'd like to be able to detect the duplicates as quickly as possible after the upload, so that the server could include this info in the response. (If the processing time by file size doesn't cause too much of a lag.)

I've read about running MD5 on the files and storing the result as unique keys, etc... but I've got a suspicion that there's a much better way. (Is there a better way?)

Any advice on how best to approach this is appreciated.

Thanks.

UPDATE: I have nothing at all against using MD5. I've used it a few times in the past with Perl (Digest::MD5). I thought that in the Java world, another (better) solution might have emerged. But, it looks like I was mistaken.

Thank you all for the answers and comments. I'm feeling pretty good about using MD5 now.


回答1:


While processing uploaded files, decorate the OutputStream with a DigestOutputStream so that you can calculate the digest of the file while writing. Store the final digest somewhere along with the unique identifier of the file (in hex as part of filename maybe?).




回答2:


You only need to add a method like this to your code and you're done. There's probably no better way. All the work is already done by the Digest API.

public static String calc(InputStream is ) {
        String output;
        int read;
        byte[] buffer = new byte[8192];

        try {
            MessageDigest digest = MessageDigest.getInstance("SHA-256"); //"MD5");
            while ((read = is.read(buffer)) > 0) {
                digest.update(buffer, 0, read);
            }
            byte[] hash = digest.digest();
            BigInteger bigInt = new BigInteger(1, hash);
            output = bigInt.toString(16);

        } 
        catch (Exception e) {
            e.printStackTrace( System.err );
            return null;
        }
        return output;
    }


来源:https://stackoverflow.com/questions/3721572/best-way-to-detect-duplicate-uploaded-files-in-a-java-environment

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!