Finding duplicate files and removing them

后端 未结 8 1706
谎友^
谎友^ 2020-11-27 09:26

I am writing a Python program to find and remove duplicate files from a folder.

I have multiple copies of mp3 files, and some other files. I am using the sh1 algorit

相关标签:
8条回答
  • 2020-11-27 10:15

    In order to be safe (removing them automatically can be dangerous if something goes wrong!), here is what I use, based on @zalew's answer.

    Pleas also note that the md5 sum code is slightly different from @zalew's because his code generated too many wrong duplicate files (that's why I said removing them automatically is dangerous!).

    import hashlib, os
    unique = dict()
    for filename in os.listdir('.'):
        if os.path.isfile(filename):
            filehash = hashlib.md5(open(filename, 'rb').read()).hexdigest()
    
            if filehash not in unique: 
                unique[filehash] = filename
            else:
                print filename + ' is a duplicate of ' + unique[filehash]
    
    0 讨论(0)
  • 2020-11-27 10:16

    I wrote one in Python some time ago -- you're welcome to use it.

    import sys
    import os
    import hashlib
    
    check_path = (lambda filepath, hashes, p = sys.stdout.write:
            (lambda hash = hashlib.sha1 (file (filepath).read ()).hexdigest ():
                    ((hash in hashes) and (p ('DUPLICATE FILE\n'
                                              '   %s\n'
                                              'of %s\n' % (filepath, hashes[hash])))
                     or hashes.setdefault (hash, filepath)))())
    
    scan = (lambda dirpath, hashes = {}: 
                    map (lambda (root, dirs, files):
                            map (lambda filename: check_path (os.path.join (root, filename), hashes), files), os.walk (dirpath)))
    
    ((len (sys.argv) > 1) and scan (sys.argv[1]))
    
    0 讨论(0)
提交回复
热议问题