BigQuery select data within a time interval

时光怂恿深爱的人放手 提交于 2019-12-13 02:15:03

问题


my data looks like

name| From | To_City | Date of request

Andy| Paris | London| 08/21/2014 12:00

Lena | Koln | Berlin | 08/22/2014 18:00

Andy| Paris | London | 08/22/2014 06:00

Lisa | Rome | Neapel | 08/25/2014 18:00

Lena | Rome | London | 08/21/2014 20:00

Lisa | Rome | Neapel | 08/24/2014 18:00

Andy| Paris | London| 08/25/2014 12:00

I want to find how many identical drive requests a person had within +/- one day. I'd love to receive a table saying:

name| From | To_City | avg Date of request | # requests

Andy| Paris | London| 08/21/2014 21:00 | 2

Lena | Koln | Berlin | 08/22/2014 18:00 | 1

Lisa | Rome | Neapel | 08/25/2014 06:00 | 2

Lena | Rome | London | 08/21/2014 20:00 | 1

Andy| Paris | London| 08/25/2014 12:00 | 1

This would be the result of a group by clause. But is it in general feasible to write such a condition that would check whether and how many identical request there are within 24 hours of an initial request? By now I download the data in Excel and do it there but there is a lot of data and hence it is not efficient...

Sample data:

Let's build a sample dataset first:

select * from (select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date)

回答1:


One way to do it is to use window functions with the RANGE window. In order to do that, first dates need to be converted to days because RANGE requires the sorting column to be sequential numbers. PARTITION BY clause is similar to GROUP BY - it lists the columns that define "identical" drive requests (in your case - name, from and to). Then you can simply use COUNT(*) to count number of days within such window.

select name, f, to, date, count(*) 
  over(partition by name, f, to
       order by day
       range between 1 preceding and 1 following) from (
select name, f, to, date, integer(timestamp(date)/1000000/60/60/24) day from
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date))



回答2:


You could truncate the date to exclude the hours, minutes and seconds. Then group by that column

SELECT SUBSTR(STRING(date-of-request), 0, 9) AS day
FROM t1
GROUP BY day


来源:https://stackoverflow.com/questions/29899097/bigquery-select-data-within-a-time-interval

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!