AWS: Ways of keeping cost down while backing up S3 files to Glacier? [closed]

As part of our project, we have created quite a bushy folder/file tree on S3 with all the files taking up about 6TB of data. We currently have no backup of this data which is bad. We want to do periodic back ups. Seems like Glacier is the way to go.

The question is: what are the ways to keep the total cost of a back up down?

Most of our files are text so we can compresses them and upload whole ZIP archives. This will require processing (on EC2) so I am curious whether there is any rule of thumb to compare extra cost of running an EC2 instance for zipping versus just uploading uncompressed files.

Also, we would have to pay for data transfer so I am wondering if there is any way of backing up other than (i) download file from S3 to an instance; (ii) upload file in its raw form or zipped up to Glacier.

I generally think of Glacier as an alternative storage to S3, not an additional storage. I.e., data would most often be stored either in S3 or Glacier, but rarely both.

If you trust S3's advertised eleven nines of durability, then you're not backing up because S3 itself is likely to lose the data.

You might want to back up the data because (like I do) you see your Amazon account as a single point of failure (e.g., credentials are compromised or Amazon blocks your account because they believe you are doing something abusive). However, in that case, Glacier is not a sufficient backup as it still falls under the Amazon umbrella.

I recommend backing up S3 data outside of Amazon if you are concerned about losing the data in S3 due to user error, compromised credentials, and the like.

I recommend using Glacier as a place to archive data for long term, cheap storage when you know you're not going to need to access it much, if ever. When things are transitioned to Glacier, you would then delete them from S3.

Amazon provides automatic archival from S3 to Glacier which works great, but beware of the extra costs if the average size of your files is small. Here's an article I wrote on that danger:

Cost of Transitioning S3 Objects to Glacier
http://alestic.com/2012/12/s3-glacier-costs

If you still want to copy from S3 to Glacier, here are some points related to your questions:

You will presumably leave the data in Glacier a long time, so compressing it is probably worth the short term CPU usage. The exact trade off depends on factors like the compressibility of your data, how long it takes to compress, and how often you need to perform the compression.
There is no charge for downloading data from S3 to an EC2 instance. There is no data transfer charge for uploading data into Glacier.
If you upload many small files to Glacier, the upload per item charges can add up. You can save on cost by combining many small files into an archive and uploading it.

Another S3 feature that can help protect against accidental loss through user error or attacks is to turn on S3 versioning and enable MFA (multi-factor authentication). This prevents anybody from being able to permanently delete objects unless they have the credentials plus a physical device in your possession.

I initially addressed the same issue in my S3 buckets I wanted to back up by doing the following:

create a second "mirror" bucket for each S3 bucket I want to backup to Glacier
launch a micro Ubuntu server instance for running cron jobs
install s3cmd on the server
write a shell script to sync all objects from each bucket to the mirror bucket
enable Lifecycle rules on the mirror bucket to change the status of each object to "Glacier"

This works just fine, but I decided for my purposes that it was easier to just enable Versioning on my bucket. This ensures that if an object is accidentally deleted or updated, it can be recovered. The drawback to this approach is that the process of restoring an entire branch or sub-tree might be time consuming. But it is easier, more cost effective, and adequate for protecting the contents of the bucket from permanent destruction.

Hope that helps someone down the road.

来源：https://stackoverflow.com/questions/15231733/aws-ways-of-keeping-cost-down-while-backing-up-s3-files-to-glacier

标签

amazon-s3

amazon-ec2

amazon-glacier