How to avoid AWS Athena CTAS query creating small files?

后端 未结 2 723
借酒劲吻你
借酒劲吻你 2020-12-19 15:59

I\'m unable to figure out what is wrong with my CTAS query, it breaks the data into smaller files while storing inside a partition even though I haven\'t mentioned any bucke

2条回答
  •  [愿得一人]
    2020-12-19 16:54

    I was able to overcome the issue by creating a bucketing column month_a. Below is the code

    CREATE TABLE sampledb.yellow_trip_data_avro
    WITH (
        format = 'AVRO',
        external_location='s3://a4189e1npss3001/Athena/internal_tables/avro/',
        partitioned_by=ARRAY['year','month'],
        bucketed_by=ARRAY['month_a'],
        bucket_count=12
    ) AS SELECT
        VendorID,
        tpep_pickup_datetime,
        tpep_dropoff_datetime,
        passenger_count,
        trip_distance,
        RatecodeID,
        store_and_fwd_flag,
        PULocationID,
        DOLocationID,
        payment_type,
        fare_amount,
        extra,
        mta_tax,
        tip_amount,
        tolls_amount,
        improvement_surcharge,
        total_amount,
        date_format(date_parse(tpep_pickup_datetime, '%Y-%c-%d %k:%i:%s'),'%c') AS month_a,
        date_format(date_parse(tpep_pickup_datetime, '%Y-%c-%d %k:%i:%s'),'%Y') AS year,
        date_format(date_parse(tpep_pickup_datetime, '%Y-%c-%d %k:%i:%s'),'%c') AS month
    FROM sampleDB.yellow_trip_data_raw;
    

提交回复
热议问题