spark read partitioned data in S3 partly in glacier
I have a dataset in parquet in S3 partitioned by date (dt) with oldest date stored in AWS Glacier to save some money. For instance, we have... s3://my-bucket/my-dataset/dt=2017-07-01/ [in glacier] ... s3://my-bucket/my-dataset/dt=2017-07-09/ [in glacier] s3://my-bucket/my-dataset/dt=2017-07-10/ [not in glacier] ... s3://my-bucket/my-dataset/dt=2017-07-24/ [not in glacier] I want to read this dataset, but only the a subset of date that are not yet in glacier, eg: val from = "2017-07-15" val to = "2017-08-24" val path = "s3://my-bucket/my-dataset/" val X = spark.read.parquet(path).where(col("dt"