How to avoid AWS Athena CTAS query creating small files?

后端 未结 2 728
借酒劲吻你
借酒劲吻你 2020-12-19 15:59

I\'m unable to figure out what is wrong with my CTAS query, it breaks the data into smaller files while storing inside a partition even though I haven\'t mentioned any bucke

2条回答
  •  情深已故
    2020-12-19 16:38

    Athena is a distributed system, and it will scale the execution on your query by some unobservable mechanism. It looks like it decided to use five workers for your CTAS query, which will result in five files in each partition.

    You could try explicitly specifying a bucket size of one, but you might still get multiple files, if I remember correctly.

提交回复
热议问题