Finding duplicate files and removing them

后端未结

关注

 8  1715

谎友^

I am writing a Python program to find and remove duplicate files from a folder.

I have multiple copies of mp3 files, and some other files. I am using the sh1 algorit

相关标签:

8条回答

无人共我

2020-11-27 10:15
In order to be safe (removing them automatically can be dangerous if something goes wrong!), here is what I use, based on @zalew's answer.

Pleas also note that the md5 sum code is slightly different from @zalew's because his code generated too many wrong duplicate files (that's why I said removing them automatically is dangerous!).
```
import hashlib, os
unique = dict()
for filename in os.listdir('.'):
    if os.path.isfile(filename):
        filehash = hashlib.md5(open(filename, 'rb').read()).hexdigest()

        if filehash not in unique: 
            unique[filehash] = filename
        else:
            print filename + ' is a duplicate of ' + unique[filehash]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

日久生厌

2020-11-27 10:16

I wrote one in Python some time ago -- you're welcome to use it.

import sys
import os
import hashlib

check_path = (lambda filepath, hashes, p = sys.stdout.write:
        (lambda hash = hashlib.sha1 (file (filepath).read ()).hexdigest ():
                ((hash in hashes) and (p ('DUPLICATE FILE\n'
                                          '   %s\n'
                                          'of %s\n' % (filepath, hashes[hash])))
                 or hashes.setdefault (hash, filepath)))())

scan = (lambda dirpath, hashes = {}: 
                map (lambda (root, dirs, files):
                        map (lambda filename: check_path (os.path.join (root, filename), hashes), files), os.walk (dirpath)))

((len (sys.argv) > 1) and scan (sys.argv[1]))

0 讨论(0)

上一页 1 2