Fastest way to sync two Amazon S3 buckets

后端 未结 5 1083
滥情空心
滥情空心 2020-12-15 05:13

I have a S3 bucket with around 4 million files taking some 500GB in total. I need to sync the files to a new bucket (actually changing the name of the bucket would suffice,

5条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-15 05:59

    Background: The bottlenecks in the sync command is listing objects and copying objects. Listing objects is normally a serial operation, although if you specify a prefix you can list a subset of objects. This is the only trick to parallelizing it. Copying objects can be done in parallel.

    Unfortunately, aws s3 sync doesn't do any parallelizing, and it doesn't even support listing by prefix unless the prefix ends in / (ie, it can list by folder). This is why it's so slow.

    s3s3mirror (and many similar tools) parallelizes the copying. I don't think it (or any other tools) parallelizes listing objects because this requires a priori knowledge of how the objects are named. However, it does support prefixes and you can invoke it multiple times for each letter of the alphabet (or whatever is appropriate).

    You can also roll-your-own using the AWS API.

    Lastly, the aws s3 sync command itself (and any tool for that matter) should be a bit faster if you launch it in an instance in the same region as your S3 bucket.

提交回复
热议问题