How to find duplicate files in an AWS S3 bucket?

问题

Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:

fdupes -r /my/directory

回答1:

There is no "find duplicates" command in Amazon S3.

However, you do do the following:

Retrieve a list of objects in the bucket
Look for objects that have the same ETag (checksum) and Size

They would (extremely likely) be duplicate objects.

回答2:

Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.

回答3:

import boto3
s3client = boto3.client('s3',aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY,region_name=region)
etag = s3client.head_object(Bucket='myBucket',Key='index.html')['ResponseMetadata']['HTTPHeaders']['etag']
print(etag)

来源：https://stackoverflow.com/questions/37063801/how-to-find-duplicate-files-in-an-aws-s3-bucket

标签

Linux

amazon-web-services

amazon-s3

amazon-ec2

duplicates

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!