Best way to move files between S3 buckets?

前端 未结 12 2127
暗喜
暗喜 2020-12-07 11:09

I\'d like to copy some files from a production bucket to a development bucket daily.

For example: Copy productionbucket/feed/feedname/date to developmentbucket/feed/

12条回答
  •  半阙折子戏
    2020-12-07 11:22

    I spent days writing my own custom tool to parallelize the copies required for this, but then I ran across documentation on how to get the AWS S3 CLI sync command to synchronize buckets with massive parallelization. The following commands will tell the AWS CLI to use 1,000 threads to execute jobs (each a small file or one part of a multipart copy) and look ahead 100,000 jobs:

    aws configure set default.s3.max_concurrent_requests 1000
    aws configure set default.s3.max_queue_size 100000
    

    After running these, you can use the simple sync command as follows:

    aws s3 sync s3://source-bucket/source-path s3://destination-bucket/destination-path
    

    On an m4.xlarge machine (in AWS--4 cores, 16GB RAM), for my case (3-50GB files) the sync/copy speed went from about 9.5MiB/s to 700+MiB/s, a speed increase of 70x over the default configuration.

    Update: Note that S3CMD has been updated over the years and these changes are now only effective when you're working with lots of small files. Also note that S3CMD on Windows (only on Windows) is seriously limited in overall throughput and can only achieve about 3Gbps per process no matter what instance size or settings you use. Other systems like S5CMD have the same problem. I've spoken to the S3 team about this and they're looking into it.

提交回复
热议问题