问题
I have the following table mytable in Hive:
id radar_id car_id datetime
1 A21 123 2017-03-08 17:31:19.0
2 A21 555 2017-03-08 17:32:00.0
3 A21 777 2017-03-08 17:33:00.0
4 B15 123 2017-03-08 17:35:22.0
5 B15 555 2017-03-08 17:34:05.0
5 B15 777 2017-03-08 20:50:12.0
6 A21 123 2017-03-09 11:00:00.0
7 C11 123 2017-03-09 11:10:00.0
8 A21 123 2017-03-09 11:12:00.0
9 A21 555 2017-03-09 11:12:10.0
10 B15 123 2017-03-09 11:14:00.0
11 C11 555 2017-03-09 11:20:00.0
I want to get the routes of cars passing through radars A21 and B15 within the same trip. For example, if the date is different for the same car_id, then it is not the same trip. Basically, I want to consider that the maximum time difference between radars A21 and B15 for the same vehicle should be 30 minutes. If it's bigger, then the trip is not the same, like for example for the car_id 777.
My final goal is to count the average number of trips per day (non-unique, so if the same car passed 2 times by the same route, then it should be calculated 2 times).
The expected result is the following one:
radar_start radar_end avg_tripscount_per_day
A21 B15 1.5
On the date 2017-03-08 there are 2 trips between radars A21 and B15 (car 777 is not considered due to 30 minutes limit), while on the date 2017-03-09 there is only 1 trip. The average is 2+1=1.5 trips per day.
How can I get this result? Basically, I do not know how to introduce 30 minutes limit in the query and how to group rides by radar_start and radar_end.
Thanks.
Update:
- The trip is registered at the date it started.
- If the car was triggered by radar
A21at2017-03-08 23:55and by radarB15at2017-03-09 00:15, then it should be considered as the same trip registered for the date2017-03-08. - In case of
ids6 and 8 the same car123passed byA21two times, and then it turned toB15(id10). The last ride withid8 should be considered. So,8-10. Thus, the closest previous toB15. The interpretation is that a car passed byA21two times and the second time is turned toB15.
回答1:
select count(*) / count(distinct to_date(datetime)) as trips_per_day
from (select radar_id
,datetime
,lead(radar_id) over w as next_radar_id
,lead(datetime) over w as next_datetime
from mytable
where radar_id in ('A21','B15')
window w as
(
partition by car_id
order by datetime
)
) t
where radar_id = 'A21'
and next_radar_id = 'B15'
and datetime + interval '30' minutes >= next_datetime
;
+----------------+
| trips_per_day |
+----------------+
| 1.5 |
+----------------+
P.s.
If your version does not support intervals, the last code record could be replaced by -and to_unix_timestamp(datetime) + 30*60 > to_unix_timestamp(next_datetime)
回答2:
I missed that you're using Hive so started writing query for SQL-Server, but maybe it will help for you. Try something like this:
QUERY
select radar_start,
radar_end,
convert(decimal(6,3), count(*)) / convert(decimal(6,3), count(distinct dt)) as avg_tripscount_per_day
from (
select
t1.radar_id as radar_start,
t2.radar_id as radar_end,
convert(date, t1.[datetime]) dt,
row_number() over (partition by t1.radar_id, t1.car_id, convert(date, t1.[datetime]) order by t1.[datetime] desc) rn1,
row_number() over (partition by t2.radar_id, t2.car_id, convert(date, t2.[datetime]) order by t2.[datetime] desc) rn2
from trips as t1
join trips as t2 on t1.car_id = t2.car_id
and datediff(minute,t1.[datetime], t2.[datetime]) between 0 and 30
and t1.radar_id = 'A21'
and t2.radar_id = 'B15'
)x
where rn1 = 1 and rn2 = 1
group by radar_start, radar_end
OUPUT
radar_start radar_end avg_tripscount_per_day
A21 B15 1.5000000000
SAMPLE DATA
create table trips
(
id int,
radar_id char(3),
car_id int,
[datetime] datetime
)
insert into trips values
(1,'A21',123,'2017-03-08 17:31:19.0'),
(2,'A21',555,'2017-03-08 17:32:00.0'),
(3,'A21',777,'2017-03-08 17:33:00.0'),
(4,'B15',123,'2017-03-08 17:35:22.0'),
(5,'B15',555,'2017-03-08 17:34:05.0'),
(5,'B15',777,'2017-03-08 20:50:12.0'),
(6,'A21',123,'2017-03-09 11:00:00.0'),
(7,'C11',123,'2017-03-09 11:10:00.0'),
(8,'A21',123,'2017-03-09 11:12:00.0'),
(9,'A21',555,'2017-03-09 11:12:10.0'),
(8,'B15',123,'2017-03-09 11:14:00.0'),
(9,'C11',555,'2017-03-09 11:20:00.0')
来源:https://stackoverflow.com/questions/45481288/how-to-retrieve-trips-from-historical-data