I\'d like to copy some files from a production bucket to a development bucket daily.
For example: Copy productionbucket/feed/feedname/date to developmentbucket/feed/
To move/copy from one bucket to another or the same bucket I use s3cmd tool and works fine. For instance:
s3cmd cp --recursive s3://bucket1/directory1 s3://bucket2/directory1
s3cmd mv --recursive s3://bucket1/directory1 s3://bucket2/directory1
If you have a unix host within AWS, then use s3cmd from s3tools.org. Set up permissions so that your key as read access to your development bucket. Then run:
s3cmd cp -r s3://productionbucket/feed/feedname/date s3://developmentbucket/feed/feedname
We had this exact problem with our ETL jobs at Snowplow, so we extracted our parallel file-copy code (Ruby, built on top of Fog), into its own Ruby gem, called Sluice:
https://github.com/snowplow/sluice
Sluice also handles S3 file delete, move and download; all parallelised and with automatic re-try if an operation fails (which it does surprisingly often). I hope it's useful!
For me the following command just worked:
aws s3 mv s3://bucket/data s3://bucket/old_data --recursive
The new official AWS CLI natively supports most of the functionality of s3cmd. I'd previously been using s3cmd or the ruby AWS SDK to do things like this, but the official CLI works great for this.
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
aws s3 sync s3://oldbucket s3://newbucket
Here is a ruby class for performing this: https://gist.github.com/4080793
Example usage:
$ gem install aws-sdk
$ irb -r ./bucket_sync_service.rb
> from_creds = {aws_access_key_id:"XXX",
aws_secret_access_key:"YYY",
bucket:"first-bucket"}
> to_creds = {aws_access_key_id:"ZZZ",
aws_secret_access_key:"AAA",
bucket:"first-bucket"}
> syncer = BucketSyncService.new(from_creds, to_creds)
> syncer.debug = true # log each object
> syncer.perform