amazon-glacier

spark read partitioned data in S3 partly in glacier

早过忘川 提交于 2019-11-30 21:39:02
I have a dataset in parquet in S3 partitioned by date (dt) with oldest date stored in AWS Glacier to save some money. For instance, we have... s3://my-bucket/my-dataset/dt=2017-07-01/ [in glacier] ... s3://my-bucket/my-dataset/dt=2017-07-09/ [in glacier] s3://my-bucket/my-dataset/dt=2017-07-10/ [not in glacier] ... s3://my-bucket/my-dataset/dt=2017-07-24/ [not in glacier] I want to read this dataset, but only the a subset of date that are not yet in glacier, eg: val from = "2017-07-15" val to = "2017-08-24" val path = "s3://my-bucket/my-dataset/" val X = spark.read.parquet(path).where(col("dt"

How does AWS transfer S3 objects to Glacier archives when you use lifecycle archive rules?

感情迁移 提交于 2019-11-30 09:28:15
Amazon Web Services (AWS) S3 allows you to automatically transfer/archive objects from S3 to Glacier. Nowhere that I can find does anybody explain how that transfer happens. What Glacier vault do S3 objects archive to? Does the lifecycle policy set any sort of description on the Glacier archives it creates? Does it create an archive per S3 object? Q: Can I use Amazon Glacier APIs to access objects that I’ve archived to Amazon Glacier? Because Amazon S3 maintains the mapping between your user-defined object name and Amazon Glacier’s system-defined identifier, Amazon S3 objects that are stored

spark read partitioned data in S3 partly in glacier

ⅰ亾dé卋堺 提交于 2019-11-30 05:30:28
问题 I have a dataset in parquet in S3 partitioned by date (dt) with oldest date stored in AWS Glacier to save some money. For instance, we have... s3://my-bucket/my-dataset/dt=2017-07-01/ [in glacier] ... s3://my-bucket/my-dataset/dt=2017-07-09/ [in glacier] s3://my-bucket/my-dataset/dt=2017-07-10/ [not in glacier] ... s3://my-bucket/my-dataset/dt=2017-07-24/ [not in glacier] I want to read this dataset, but only the a subset of date that are not yet in glacier, eg: val from = "2017-07-15" val to

Node reading file in specified chunk size

余生长醉 提交于 2019-11-30 04:02:18
问题 The goal: Upload large files to AWS Glacier without holding the whole file in memory. I'm currently uploading to glacier now using fs.readFileSync() and things are working. But, I need to handle files larger than 4GB and I'd like to upload multiple chunks in parallel. This means moving to multipart uploads. I can choose the chunk size but then glacier needs every chunk to be the same size (except the last) This thread suggests that I can set a chunk size on a read stream but that I'm not

How to restore folders (or entire buckets) to Amazon S3 from Glacier?

蹲街弑〆低调 提交于 2019-11-29 22:54:40
I changed the lifecycle for a bunch of my buckets on Amazon S3 so their storage class was set to Glacier. I did this using the online AWS Console. I now need those files again. I know how to restore them back to S3 per file. But my buckets have thousands of files. I wanted to see if there was a way to restore the entire bucket back to S3, just like there was a way to send the entire bucket to Glacier? I'm guessing there's a way to program a solution. But I wanted to see if there was a way to do it in the Console. Or with another program? Or something else I might be missing? There isn't a

Permanently restore Glacier to S3

陌路散爱 提交于 2019-11-29 09:08:57
I'm wondering whether there is an easy way to permanently restore Glacier objects to S3. It seems that you can restore Glacier objects for the certain amount of time you provide when restoring to S3. So for example, we have now thousands of files restored to S3 that will get back to Glacier in 90 days but we do not want them back in Glacier. To clarify a technicality on one point, your files will not "go back to" Glacier in 90 days -- because they are still in Glacier, but since you have done a restore, there are temporary copies living in S3 reduced redundancy storage (RRS) that S3 will

How to restore folders (or entire buckets) to Amazon S3 from Glacier?

半城伤御伤魂 提交于 2019-11-28 17:25:17
问题 I changed the lifecycle for a bunch of my buckets on Amazon S3 so their storage class was set to Glacier. I did this using the online AWS Console. I now need those files again. I know how to restore them back to S3 per file. But my buckets have thousands of files. I wanted to see if there was a way to restore the entire bucket back to S3, just like there was a way to send the entire bucket to Glacier? I'm guessing there's a way to program a solution. But I wanted to see if there was a way to

Permanently restore Glacier to S3

☆樱花仙子☆ 提交于 2019-11-28 02:48:53
问题 I'm wondering whether there is an easy way to permanently restore Glacier objects to S3. It seems that you can restore Glacier objects for the certain amount of time you provide when restoring to S3. So for example, we have now thousands of files restored to S3 that will get back to Glacier in 90 days but we do not want them back in Glacier. 回答1: To clarify a technicality on one point, your files will not "go back to" Glacier in 90 days -- because they are still in Glacier, but since you have