I\'d like to copy some files from a production bucket to a development bucket daily.
For example: Copy productionbucket/feed/feedname/date to developmentbucket/feed/
To move/copy from one bucket to another or the same bucket I use s3cmd tool and works fine. For instance:
s3cmd cp --recursive s3://bucket1/directory1 s3://bucket2/directory1
s3cmd mv --recursive s3://bucket1/directory1 s3://bucket2/directory1
If you have a unix host within AWS, then use s3cmd from s3tools.org. Set up permissions so that your key as read access to your development bucket. Then run:
s3cmd cp -r s3://productionbucket/feed/feedname/date s3://developmentbucket/feed/feedname
We had this exact problem with our ETL jobs at Snowplow, so we extracted our parallel file-copy code (Ruby, built on top of Fog), into its own Ruby gem, called Sluice:
https://github.com/snowplow/sluice
Sluice also handles S3 file delete, move and download; all parallelised and with automatic re-try if an operation fails (which it does surprisingly often). I hope it's useful!
For me the following command just worked:
aws s3 mv s3://bucket/data s3://bucket/old_data --recursive
The new official AWS CLI natively supports most of the functionality of s3cmd
. I'd previously been using s3cmd
or the ruby AWS SDK to do things like this, but the official CLI works great for this.
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
aws s3 sync s3://oldbucket s3://newbucket
Here is a ruby class for performing this: https://gist.github.com/4080793
Example usage:
$ gem install aws-sdk
$ irb -r ./bucket_sync_service.rb
> from_creds = {aws_access_key_id:"XXX",
aws_secret_access_key:"YYY",
bucket:"first-bucket"}
> to_creds = {aws_access_key_id:"ZZZ",
aws_secret_access_key:"AAA",
bucket:"first-bucket"}
> syncer = BucketSyncService.new(from_creds, to_creds)
> syncer.debug = true # log each object
> syncer.perform