How to choose the latest partition in BigQuery table?

后端 未结 7 966
暗喜
暗喜 2020-12-09 21:21

I am trying to select data from the latest partition in a date-partitioned BigQuery table, but the query still reads data from the whole table.

I\'ve tried (as far a

7条回答
  •  青春惊慌失措
    2020-12-09 21:48

    October 2019 Update

    Support for Scripting and Stored Procedures is now in beta (as of October 2019)

    You can submit multiple statements separated with semi-colons and BigQuery is able to run them now

    See example below

    DECLARE max_date TIMESTAMP;
    SET max_date = (
      SELECT MAX(_PARTITIONTIME) FROM project.dataset.partitioned_table`);
    
    SELECT * FROM `project.dataset.partitioned_table`
    WHERE _PARTITIONTIME = max_date;
    

    Update for those who like downvoting without checking context, etc.

    I think, this answer was accepted because it addressed the OP's main question Is there a way I can pull data only from the latest partition in BigQuery? and in comments it was mentioned that it is obvious that BQ engine still scans ALL rows but returns result based on ONLY recent partition. As it was already mentioned in comment for question - Still something that easily to be addressed by having that logic scripted - first getting result of subquery and then use it in final query

    Try

    SELECT * FROM [dataset.partitioned_table]
    WHERE _PARTITIONTIME IN (
      SELECT MAX(TIMESTAMP(partition_id))
      FROM [dataset.partitioned_table$__PARTITIONS_SUMMARY__]
    )  
    

    or

    SELECT * FROM [dataset.partitioned_table]
    WHERE _PARTITIONTIME IN (
      SELECT MAX(_PARTITIONTIME) 
      FROM [dataset.partitioned_table]
    )  
    

提交回复
热议问题