Partition Athena query by S3 created date

后端 未结 2 496
梦如初夏
梦如初夏 2020-12-06 03:37

I have a S3 bucket with ~ 70 million JSONs (~ 15TB) and an athena table to query by timestamp and some other keys definied in the JSON.

It is guaranteed, that the ti

2条回答
  •  感动是毒
    2020-12-06 04:30

    I started working with Theo's answer and it was very close (Thank you, Theo for the excellent and very detailed response), but when adding multiple partitions according to the documentation you only need to specify "ADD" once near the beginning of the query.

    I tried specifying "ADD" on each line per Theo's example but received an error. It works when only specified once, though. Below is the format I used which was successful:

    ALTER TABLE db.table_name ADD IF NOT EXISTS
     PARTITION (event_date = '2019-03-01') LOCATION 's3://bucket-name/2019-03-01/'
     PARTITION (event_date = '2019-03-02') LOCATION 's3://bucket-name/2019-03-02/'
     PARTITION (event_date = '2019-03-03') LOCATION 's3://bucket-name/2019-03-03/'
     ...
    

提交回复
热议问题