问题
I have a dataset containing tables with similar table names ending in yyyymmdd. For example:
myproject:mydataset.Sales20140815
myproject:mydataset.Sales20140816
myproject:mydataset.Sales20140817
myproject:mydataset.Sales20140818
...
myproject:mydataset.Sales20140903
myproject:mydataset.Sales20140904
Is there any way to write the BigQuery to query the latest table in the dataset (for the above example it is myproject:mydataset.Sales20140904 )?
回答1:
N.N. answer is good, but relying on the modification date is problematic if an old set of data is reimported that would erroneously be pulled as the "latest" Since the table_id explicitly lists the dates in the correct order it is best to use that value directly.
SELECT
*
FROM
TABLE_QUERY(MyDATASET,
'table_id CONTAINS "MyTable"
AND table_id= (Select MAX(table_id)
FROM MyDATASET.__TABLES__
where table_id contains "MyTable")'
)
回答2:
If you have to use standard dialect (which is highly recommended by BQ Team) it should be something like this
#standardSQL
select * from `myproject:mydataset.*`
where _TABLE_SUFFIX = (select max(_TABLE_SUFFIX) from `myproject:mydataset.*`)
One benefit of this that you can also expose table name you query in result
#standardSQL
select _TABLE_SUFFIX source, t.* from `myproject:mydataset.*` t
where _TABLE_SUFFIX = (select max(_TABLE_SUFFIX) from `myproject:mydataset.*`)
回答3:
I'd use Table wildcard function. If the latest is today's table, use
Select * from TABLE_DATE_RANGE(MyDATASET.PREFIX, Current_Timestamp(), Current_Timestamp())
If last changed table could be of a past date. you can use:
SELECT
*
FROM
TABLE_QUERY(MyDATASET,
'table_id CONTAINS "MyTable"
AND last_modified_time= (Select MAX(last_modified_time)
FROM MyDATASET.__TABLES__
where table_id contains "MyTable")'
)
Hope this helps...
回答4:
SELECT *
FROM TABLE_QUERY(myproject:mydataset,
"table_id IN (
SELECT table_id FROM myproject:mydataset.__TABLES__
WHERE REGEXP_MATCH(table_id, r"^Sales.*")
ORDER BY creation_time DESC LIMIT 1)")
回答5:
Only solutions I can think of involve modifications to your daily ETL:
A: update your ETL to create a copy of the latest table once it's been loaded or updated. If you're using bq command line tool that would be something like:
bq cp mydataset.Sales20140904 mydataset.SalesLatestDay
Then you just query against the SalesLatestDay table.
B: Better yet, create a View that references your most recent table ( "SELECT * FROM mydataset.Sales20140904" ), and update it daily. Info on creating views using the REST API: https://developers.google.com/bigquery/docs/reference/v2/tables#resource
回答6:
If your table is surely updated daily, here is my trick.
SELECT * FROM TABLE_DATE_RANGE(myproject:mydataset.Sales, CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP())
来源:https://stackoverflow.com/questions/25676049/query-latest-table-in-the-bigquery-dataset