Query latest table in the BigQuery dataset

人盡茶涼 提交于 2021-02-04 15:37:26

问题


I have a dataset containing tables with similar table names ending in yyyymmdd. For example:

myproject:mydataset.Sales20140815
myproject:mydataset.Sales20140816
myproject:mydataset.Sales20140817
myproject:mydataset.Sales20140818
...
myproject:mydataset.Sales20140903
myproject:mydataset.Sales20140904 

Is there any way to write the BigQuery to query the latest table in the dataset (for the above example it is myproject:mydataset.Sales20140904 )?


回答1:


N.N. answer is good, but relying on the modification date is problematic if an old set of data is reimported that would erroneously be pulled as the "latest" Since the table_id explicitly lists the dates in the correct order it is best to use that value directly.

SELECT 
  *
FROM 
TABLE_QUERY(MyDATASET, 
      'table_id CONTAINS "MyTable" 
      AND table_id= (Select MAX(table_id) 
                              FROM MyDATASET.__TABLES__
                              where table_id contains "MyTable")'
            )



回答2:


If you have to use standard dialect (which is highly recommended by BQ Team) it should be something like this

#standardSQL
select * from `myproject:mydataset.*`
where _TABLE_SUFFIX = (select max(_TABLE_SUFFIX) from `myproject:mydataset.*`)

One benefit of this that you can also expose table name you query in result

#standardSQL
select _TABLE_SUFFIX source, t.* from `myproject:mydataset.*` t
where _TABLE_SUFFIX = (select max(_TABLE_SUFFIX) from `myproject:mydataset.*`) 



回答3:


I'd use Table wildcard function. If the latest is today's table, use

Select * from TABLE_DATE_RANGE(MyDATASET.PREFIX, Current_Timestamp(), Current_Timestamp())

If last changed table could be of a past date. you can use:

    SELECT 
      *
    FROM 
    TABLE_QUERY(MyDATASET, 
          'table_id CONTAINS "MyTable" 
          AND last_modified_time= (Select MAX(last_modified_time) 
                                  FROM MyDATASET.__TABLES__
                                  where table_id contains "MyTable")'
                )

Hope this helps...




回答4:


SELECT * 
FROM TABLE_QUERY(myproject:mydataset,
  "table_id IN (
     SELECT table_id FROM myproject:mydataset.__TABLES__  
     WHERE REGEXP_MATCH(table_id, r"^Sales.*")
     ORDER BY creation_time DESC LIMIT 1)")



回答5:


Only solutions I can think of involve modifications to your daily ETL:

A: update your ETL to create a copy of the latest table once it's been loaded or updated. If you're using bq command line tool that would be something like:

bq cp mydataset.Sales20140904 mydataset.SalesLatestDay

Then you just query against the SalesLatestDay table.

B: Better yet, create a View that references your most recent table ( "SELECT * FROM mydataset.Sales20140904" ), and update it daily. Info on creating views using the REST API: https://developers.google.com/bigquery/docs/reference/v2/tables#resource




回答6:


If your table is surely updated daily, here is my trick.

SELECT * FROM TABLE_DATE_RANGE(myproject:mydataset.Sales, CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP())


来源:https://stackoverflow.com/questions/25676049/query-latest-table-in-the-bigquery-dataset

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!