BigQuery filter per the last Date and use Partition

好久不见. 提交于 2020-01-21 22:18:38

问题


I asked how to filter the last date and got excellent answers (BigQuery, how to use alias in where clause?), they all work, but, they scan the whole table, the field SETTLEMENTDATE is a partition field, is there a way to scan only one partition

as an example, I am using this query

#standardSQL
SELECT * EXCEPT(isLastDate) 
FROM (
  SELECT *, DATE(SETTLEMENTDATE) = MAX(DATE(SETTLEMENTDATE)) OVER() isLastDate
  FROM `biengine-252003.aemo2.daily`
)
WHERE isLastDate 

edit : please last date is not always current date, as there is lag in the data


回答1:


Now that scripting is in beta in BigQuery, you can declare a variable that contains the target date. Here's an example:

SET max_date DATE DEFAULT (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es');

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = max_date
AND wiki='es'



回答2:


Assuming SETTLEMENTDATE is of DATE data type, you can use below to get today's partition

SELECT *
FROM `biengine-252003.aemo2.daily`
WHERE SETTLEMENTDATE = CURRENT_DATE()     

or, for example for yesterday's partition

SELECT *
FROM `biengine-252003.aemo2.daily`
WHERE SETTLEMENTDATE = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)     

See more at https://cloud.google.com/bigquery/docs/querying-partitioned-tables#querying_partitioned_tables_2




回答3:


Mikhail's answer looks like this (working on public data):

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)     
AND wiki='es' 
# 122.2 MB processed

But it seems the question wants something like this:

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es')     
AND wiki='es'
# 50.6 GB processed

... but for way less than 50.6GB

What you need now is some sort of scripting, to perform this in 2 steps:

max_date = (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es')   

;

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = {{max_date}}
AND wiki='es'
# 115.2 MB processed

You will have to script this outside BigQuery - or wait for news on https://issuetracker.google.com/issues/36955074.



来源:https://stackoverflow.com/questions/57862114/bigquery-filter-per-the-last-date-and-use-partition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!