问题
I am using Presto and Zeppelin. There are a lot of raw datas. I have to summarize those datas.
I wanna group time every 5 seconds.
serviceType logType date
------------------------------------------------------
service1 log1 2017-10-24 23:00:23.206
service1 log1 2017-10-24 23:00:23.207
service1 log1 2017-10-24 23:00:25.206
service2 log1 2017-10-24 23:00:24.206
service1 log2 2017-10-24 23:00:27.206
service1 log2 2017-10-24 23:00:29.302
then the result
serviceType logType date cnt
--------------------------------------------------------------
service1 log1 2017-10-24 23:00:20 2
service2 log1 2017-10-24 23:00:20 1
service1 log1 2017-10-24 23:00:25 1
service1 log2 2017-10-24 23:00:25 2
first, I have to migrate stored datas to new tables.
second, I have to group datas and save to the new table realtime.
It's hard to write sql script.
Please help me.
Do I have to use python interpreter?
回答1:
You can
- discard millisecond part of a
timestamp
withdate_trunc
- you can round a
timestamp
without millisecond part to 5 seconds withts - interval '1' second * (second(ts) % 5)
Example putting this together:
presto> SELECT ts_rounded, count(*)
-> FROM (
-> SELECT date_trunc('second', ts) - interval '1' second * (second(ts) % 5) AS ts_rounded
-> FROM (VALUES timestamp '2017-10-24 23:01:20.206',
-> timestamp '2017-10-24 23:01:23.206',
-> timestamp '2017-10-24 23:01:23.207',
-> timestamp '2017-10-24 23:01:26.206') AS t(ts)
-> )
-> GROUP BY ts_rounded ORDER BY ts_rounded;
ts_rounded | _col1
-------------------------+-------
2017-10-24 23:01:20.000 | 3
2017-10-24 23:01:25.000 | 1
(2 rows)
来源:https://stackoverflow.com/questions/47066024/how-to-group-time-column-into-5-second-intervals-and-count-rows-using-presto