Best way to move files between S3 buckets?

前端 未结 12 2102
暗喜
暗喜 2020-12-07 11:09

I\'d like to copy some files from a production bucket to a development bucket daily.

For example: Copy productionbucket/feed/feedname/date to developmentbucket/feed/

相关标签:
12条回答
  • 2020-12-07 11:22

    I spent days writing my own custom tool to parallelize the copies required for this, but then I ran across documentation on how to get the AWS S3 CLI sync command to synchronize buckets with massive parallelization. The following commands will tell the AWS CLI to use 1,000 threads to execute jobs (each a small file or one part of a multipart copy) and look ahead 100,000 jobs:

    aws configure set default.s3.max_concurrent_requests 1000
    aws configure set default.s3.max_queue_size 100000
    

    After running these, you can use the simple sync command as follows:

    aws s3 sync s3://source-bucket/source-path s3://destination-bucket/destination-path
    

    On an m4.xlarge machine (in AWS--4 cores, 16GB RAM), for my case (3-50GB files) the sync/copy speed went from about 9.5MiB/s to 700+MiB/s, a speed increase of 70x over the default configuration.

    Update: Note that S3CMD has been updated over the years and these changes are now only effective when you're working with lots of small files. Also note that S3CMD on Windows (only on Windows) is seriously limited in overall throughput and can only achieve about 3Gbps per process no matter what instance size or settings you use. Other systems like S5CMD have the same problem. I've spoken to the S3 team about this and they're looking into it.

    0 讨论(0)
  • 2020-12-07 11:30

    Update

    As pointed out by alberge (+1), nowadays the excellent AWS Command Line Interface provides the most versatile approach for interacting with (almost) all things AWS - it meanwhile covers most services' APIs and also features higher level S3 commands for dealing with your use case specifically, see the AWS CLI reference for S3:

    • sync - Syncs directories and S3 prefixes. Your use case is covered by Example 2 (more fine grained usage with --exclude, --include and prefix handling etc. is also available):

      The following sync command syncs objects under a specified prefix and bucket to objects under another specified prefix and bucket by copying s3 objects. [...]

      aws s3 sync s3://from_my_bucket s3://to_my_other_bucket
      

    For completeness, I'll mention that the lower level S3 commands are also still available via the s3api sub command, which would allow to directly translate any SDK based solution to the AWS CLI before adopting its higher level functionality eventually.


    Initial Answer

    Moving files between S3 buckets can be achieved by means of the PUT Object - Copy API (followed by DELETE Object):

    This implementation of the PUT operation creates a copy of an object that is already stored in Amazon S3. A PUT copy operation is the same as performing a GET and then a PUT. Adding the request header, x-amz-copy-source, makes the PUT operation copy the source object into the destination bucket. Source

    There are respective samples for all existing AWS SDKs available, see Copying Objects in a Single Operation. Naturally, a scripting based solution would be the obvious first choice here, so Copy an Object Using the AWS SDK for Ruby might be a good starting point; if you prefer Python instead, the same can be achieved via boto as well of course, see method copy_key() within boto's S3 API documentation.

    PUT Object only copies files, so you'll need to explicitly delete a file via DELETE Object still after a successful copy operation, but that will be just another few lines once the overall script handling the bucket and file names is in place (there are respective examples as well, see e.g. Deleting One Object Per Request).

    0 讨论(0)
  • 2020-12-07 11:30

    I know this is an old thread but for others who reach there my suggestion is to create a scheduled job to copy content from production bucket to development one.

    You can use If you use .NET this article might help you

    https://edunyte.com/2015/03/aws-s3-copy-object-from-one-bucket-or/

    0 讨论(0)
  • 2020-12-07 11:31

    .NET Example as requested:

    using (client)
    {
        var existingObject = client.ListObjects(requestForExisingFile).S3Objects; 
        if (existingObject.Count == 1)
        {
            var requestCopyObject = new CopyObjectRequest()
            {
                SourceBucket = BucketNameProd,
                SourceKey = objectToMerge.Key,
                DestinationBucket = BucketNameDev,
                DestinationKey = newKey
            };
            client.CopyObject(requestCopyObject);
        }
    }
    

    with client being something like

    var config = new AmazonS3Config { CommunicationProtocol = Protocol.HTTP, ServiceURL = "s3-eu-west-1.amazonaws.com" };
    var client = AWSClientFactory.CreateAmazonS3Client(AWSAccessKey, AWSSecretAccessKey, config);
    

    There might be a better way, but it's just some quick code I wrote to get some files transferred.

    0 讨论(0)
  • 2020-12-07 11:31

    For new version aws2.

    aws2 s3 sync s3://SOURCE_BUCKET_NAME s3://NEW_BUCKET_NAME
    
    0 讨论(0)
  • 2020-12-07 11:34

    Actually as of recently I just use the copy+paste action in the AWS s3 interface. Just navigate to the files you want to copy, click on "Actions" -> "Copy" then navigate to the destination bucket and "Actions" -> "Paste"

    It transfers the files pretty quick and it seems like a less convoluted solution that doesn't require any programming, or over the top solutions like that.

    0 讨论(0)
提交回复
热议问题