问题
Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:
fdupes -r /my/directory
回答1:
There is no "find duplicates" command in Amazon S3.
However, you do do the following:
- Retrieve a list of objects in the bucket
- Look for objects that have the same
ETag
(checksum) andSize
They would (extremely likely) be duplicate objects.
回答2:
Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.
回答3:
import boto3
s3client = boto3.client('s3',aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY,region_name=region)
etag = s3client.head_object(Bucket='myBucket',Key='index.html')['ResponseMetadata']['HTTPHeaders']['etag']
print(etag)
来源:https://stackoverflow.com/questions/37063801/how-to-find-duplicate-files-in-an-aws-s3-bucket