AWS: Ways of keeping cost down while backing up S3 files to Glacier? [closed]

﹥>﹥吖頭↗ 提交于 2019-12-03 06:52:41

I generally think of Glacier as an alternative storage to S3, not an additional storage. I.e., data would most often be stored either in S3 or Glacier, but rarely both.

If you trust S3's advertised eleven nines of durability, then you're not backing up because S3 itself is likely to lose the data.

You might want to back up the data because (like I do) you see your Amazon account as a single point of failure (e.g., credentials are compromised or Amazon blocks your account because they believe you are doing something abusive). However, in that case, Glacier is not a sufficient backup as it still falls under the Amazon umbrella.

I recommend backing up S3 data outside of Amazon if you are concerned about losing the data in S3 due to user error, compromised credentials, and the like.

I recommend using Glacier as a place to archive data for long term, cheap storage when you know you're not going to need to access it much, if ever. When things are transitioned to Glacier, you would then delete them from S3.

Amazon provides automatic archival from S3 to Glacier which works great, but beware of the extra costs if the average size of your files is small. Here's an article I wrote on that danger:

Cost of Transitioning S3 Objects to Glacier
http://alestic.com/2012/12/s3-glacier-costs

If you still want to copy from S3 to Glacier, here are some points related to your questions:

  • You will presumably leave the data in Glacier a long time, so compressing it is probably worth the short term CPU usage. The exact trade off depends on factors like the compressibility of your data, how long it takes to compress, and how often you need to perform the compression.

  • There is no charge for downloading data from S3 to an EC2 instance. There is no data transfer charge for uploading data into Glacier.

  • If you upload many small files to Glacier, the upload per item charges can add up. You can save on cost by combining many small files into an archive and uploading it.

Another S3 feature that can help protect against accidental loss through user error or attacks is to turn on S3 versioning and enable MFA (multi-factor authentication). This prevents anybody from being able to permanently delete objects unless they have the credentials plus a physical device in your possession.

I initially addressed the same issue in my S3 buckets I wanted to back up by doing the following:

  1. create a second "mirror" bucket for each S3 bucket I want to backup to Glacier
  2. launch a micro Ubuntu server instance for running cron jobs
  3. install s3cmd on the server
  4. write a shell script to sync all objects from each bucket to the mirror bucket
  5. enable Lifecycle rules on the mirror bucket to change the status of each object to "Glacier"

This works just fine, but I decided for my purposes that it was easier to just enable Versioning on my bucket. This ensures that if an object is accidentally deleted or updated, it can be recovered. The drawback to this approach is that the process of restoring an entire branch or sub-tree might be time consuming. But it is easier, more cost effective, and adequate for protecting the contents of the bucket from permanent destruction.

Hope that helps someone down the road.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!