Partition by week/month//quarter/year to get over the partition limit?

后端 未结 2 907
醉梦人生
醉梦人生 2020-11-27 20:37

I have 32 years of data that I want to put into a partitioned table. However BigQuery says that I\'m going over the limit (4000 partitions).

For a query like:

<
2条回答
  •  执念已碎
    2020-11-27 21:09

    Instead of partitioning by day, you could partition by week/month/year.

    In my case each year of data contains around ~3GB of data, so I'll get the most benefits from clustering if I partition by year.

    For this, I'll create a year date column, and partition by it:

    CREATE TABLE `fh-bigquery.flights.ontime_201903`
    PARTITION BY FlightDate_year
    CLUSTER BY Origin, Dest 
    AS
    SELECT *, DATE_TRUNC(FlightDate, YEAR) FlightDate_year
    FROM `fh-bigquery.flights.raw_load_fixed`
    

    Note that I created the extra column DATE_TRUNC(FlightDate, YEAR) AS FlightDate_year in the process.

    Table stats:

    Since the table is clustered, I'll get the benefits of partitioning even if I don't use the partitioning column (year) as a filter:

    SELECT *
    FROM `fh-bigquery.flights.ontime_201903`
    WHERE FlightDate BETWEEN '2008-01-01' AND '2008-01-10'
    
    Predicted cost: 83.4 GB
    Actual cost: 3.2 GB
    

提交回复
热议问题