问题
my data looks like
name| From | To_City | Date of request
Andy| Paris | London| 08/21/2014 12:00
Lena | Koln | Berlin | 08/22/2014 18:00
Andy| Paris | London | 08/22/2014 06:00
Lisa | Rome | Neapel | 08/25/2014 18:00
Lena | Rome | London | 08/21/2014 20:00
Lisa | Rome | Neapel | 08/24/2014 18:00
Andy| Paris | London| 08/25/2014 12:00
I want to find how many identical drive requests a person had within +/- one day. I'd love to receive a table saying:
name| From | To_City | avg Date of request | # requests
Andy| Paris | London| 08/21/2014 21:00 | 2
Lena | Koln | Berlin | 08/22/2014 18:00 | 1
Lisa | Rome | Neapel | 08/25/2014 06:00 | 2
Lena | Rome | London | 08/21/2014 20:00 | 1
Andy| Paris | London| 08/25/2014 12:00 | 1
This would be the result of a group by clause. But is it in general feasible to write such a condition that would check whether and how many identical request there are within 24 hours of an initial request? By now I download the data in Excel and do it there but there is a lot of data and hence it is not efficient...
Sample data:
Let's build a sample dataset first:
select * from (select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date)
回答1:
One way to do it is to use window functions with the RANGE window. In order to do that, first dates need to be converted to days because RANGE requires the sorting column to be sequential numbers. PARTITION BY clause is similar to GROUP BY - it lists the columns that define "identical" drive requests (in your case - name, from and to). Then you can simply use COUNT(*) to count number of days within such window.
select name, f, to, date, count(*)
over(partition by name, f, to
order by day
range between 1 preceding and 1 following) from (
select name, f, to, date, integer(timestamp(date)/1000000/60/60/24) day from
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date))
回答2:
You could truncate the date to exclude the hours, minutes and seconds. Then group by that column
SELECT SUBSTR(STRING(date-of-request), 0, 9) AS day
FROM t1
GROUP BY day
来源:https://stackoverflow.com/questions/29899097/bigquery-select-data-within-a-time-interval