How to find duplicate files in an AWS S3 bucket?

微笑、不失礼 提交于 2019-12-19 09:55:17

问题


Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:

fdupes -r /my/directory

回答1:


There is no "find duplicates" command in Amazon S3.

However, you do do the following:

  • Retrieve a list of objects in the bucket
  • Look for objects that have the same ETag (checksum) and Size

They would (extremely likely) be duplicate objects.




回答2:


Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.




回答3:


import boto3
s3client = boto3.client('s3',aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY,region_name=region)
etag = s3client.head_object(Bucket='myBucket',Key='index.html')['ResponseMetadata']['HTTPHeaders']['etag']
print(etag)


来源:https://stackoverflow.com/questions/37063801/how-to-find-duplicate-files-in-an-aws-s3-bucket

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!