MySQL GROUP BY DateTime +/- 3 seconds

后端未结

关注

 5  1220

再見小時候

Suppose I have a table with 3 columns:

id (PK, int)
timestamp (datetime)
title (text)

I have the following records:

相关标签:

5条回答

爱一瞬间的悲伤

2020-12-03 05:58
Warning: Long answer. This should work, and is fairly neat, except for one step in the middle where you have to be willing to run an INSERT statement over and over until it doesn't do anything since we can't do recursive CTE things in MySQL.

I'm going to use this data as the example instead of yours:
```
id    Timestamp
1     1:00:00
2     1:00:03
3     1:00:06
4     1:00:10
```
Here is the first query to write:
```
SELECT a.id as aid, b.id as bid
FROM Table a
JOIN Table b 
ON (a.Timestamp is within 3 seconds of b.Timestamp)
```
It returns:
```
aid     bid
1       1
1       2
2       1
2       2
2       3
3       2
3       3
4       4
```
Let's create a nice table to hold those things that won't allow duplicates:
```
CREATE TABLE
Adjacency
( aid INT(11)
, bid INT(11)
, PRIMARY KEY (aid, bid) --important for later
)
```
Now the challenge is to find something like the transitive closure of that relation.

To do so, let's find the next level of links. by that I mean, since we have 1 2 and 2 3 in the Adjacency table, we should add 1 3:
```
INSERT IGNORE INTO Adjacency(aid,bid)
SELECT adj1.aid, adj2.bid
FROM Adjacency adj1
JOIN Adjacency adj2
ON (adj1.bid = adj2.aid)
```
This is the non-elegant part: You'll need to run the above INSERT statement over and over until it doesn't add any rows to the table. I don't know if there is a neat way to do that.

Once this is over, you will have a transitively-closed relation like this:
```
aid     bid
1       1
1       2
1       3     --added
2       1
2       2
2       3
3       1     --added
3       2
3       3
4       4
```
And now for the punchline:
```
SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
FROM Adjacency
GROUP BY aid
```
returns:
```
aid     Neighbors
1       1,2,3
2       1,2,3
3       1,2,3
4       4
```
So
```
SELECT DISTINCT Neighbors
FROM (
     SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
     FROM Adjacency
     GROUP BY aid
     ) Groupings
```
returns
```
Neighbors
1,2,3
4
```
Whew!
0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-03 05:58
Simple query:
```
SELECT * FROM time_history GROUP BY ROUND(UNIX_TIMESTAMP(time_stamp)/3);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2020-12-03 05:59
I'm using Tom H.'s excellent idea but doing it a little differently here:

Instead of finding all the rows that are the beginnings of chains, we can find all times that are the beginnings of chains, then go back and ifnd the rows that match the times.

Query #1 here should tell you which times are the beginnings of chains by finding which times do not have any times below them but within 3 seconds:
```
SELECT DISTINCT Timestamp
FROM Table a
LEFT JOIN Table b
ON (b.Timestamp >= a.TimeStamp - INTERVAL 3 SECONDS
    AND b.Timestamp < a.Timestamp)
WHERE b.Timestamp IS NULL
```
And then for each row, we can find the largest chain-starting timestamp that is less than our timestamp with Query #2:
```
SELECT Table.id, MAX(StartOfChains.TimeStamp) AS ChainStartTime
FROM Table
JOIN ([query #1]) StartofChains
ON Table.Timestamp >= StartOfChains.TimeStamp
GROUP BY Table.id
```
Once we have that, we can GROUP BY it as you wanted.
```
SELECT COUNT(*) --or whatever
FROM Table
JOIN ([query #2]) GroupingQuery
ON Table.id = GroupingQuery.id
GROUP BY GroupingQuery.ChainStartTime
```
I'm not entirely sure this is distinct enough from Tom H's answer to be posted separately, but it sounded like you were having trouble with implementation, and I was thinking about it, so I thought I'd post again. Good luck!
0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2020-12-03 06:06

Now that I think that I understand your problem, based on your comment response to OMG Ponies, I think that I have a set-based solution. The idea is to first find the start of any chains based on the title. The start of a chain is going to be defined as any row where there is no match within three seconds prior to that row:

SELECT
    MT1.my_id,
    MT1.title,
    MT1.my_time
FROM
    My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
    MT2.title = MT1.title AND
    (
        MT2.my_time < MT1.my_time OR
        (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
    ) AND
    MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
    MT2.my_id IS NULL

Now we can assume that any non-chain starters belong to the chain starter that appeared before them. Since MySQL doesn't support CTEs, you might want to throw the above results into a temporary table, as that would save you the multiple joins to the same subquery below.

SELECT
    SQ1.my_id,
    COUNT(*)  -- You didn't say what you were trying to calculate, just that you needed to group them
FROM
(
    SELECT
        MT1.my_id,
        MT1.title,
        MT1.my_time
    FROM
        My_Table MT1
    LEFT OUTER JOIN My_Table MT2 ON
        MT2.title = MT1.title AND
        (
            MT2.my_time < MT1.my_time OR
            (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
        ) AND
        MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
    WHERE
        MT2.my_id IS NULL
) SQ1
INNER JOIN My_Table MT3 ON
    MT3.title = SQ1.title AND
    MT3.my_time >= SQ1.my_time
LEFT OUTER JOIN
(
    SELECT
        MT1.my_id,
        MT1.title,
        MT1.my_time
    FROM
        My_Table MT1
    LEFT OUTER JOIN My_Table MT2 ON
        MT2.title = MT1.title AND
        (
            MT2.my_time < MT1.my_time OR
            (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
        ) AND
        MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
    WHERE
        MT2.my_id IS NULL
) SQ2 ON
    SQ2.title = SQ1.title AND
    SQ2.my_time > SQ1.my_time AND
    SQ2.my_time <= MT3.my_time
WHERE
    SQ2.my_id IS NULL

This would look much simpler if you could use CTEs or if you used a temporary table. Using the temporary table might also help performance.

Also, there will be issues with this if you can have timestamps that match exactly. If that's the case then you will need to tweak the query slightly to use a combination of the id and the timestamp to distinguish rows with matching timestamp values.

EDIT: Changed the queries to handle exact matches by timestamp.

0 讨论(0)

太阳男子

2020-12-03 06:06
I like @Chris Cunningham's answer, but here's another take on it.

First, my understanding of your problem statement (correct me if I'm wrong):

You want to look at your event log as a sequence, ordered by the time of the event, and partitition it into groups, defining the boundary as being an interval of more than 3 seconds between two adjacent rows in the sequence.

I work mostly in SQL Server, so I'm using SQL Server syntax. It shouldn't be too difficult to translate into MySQL SQL.

So, first our event log table:
```
--
-- our event log table
--
create table dbo.eventLog
(
  id       int          not null ,
  dtLogged datetime     not null ,
  title    varchar(200) not null ,

  primary key nonclustered ( id ) ,
  unique clustered ( dtLogged , id ) ,

)
```
Given the above understanding of the problem statement, the following query should give you the upper and lower bounds your groups. It's a simple, nested select statement with 2 group by to collapse things:
- The innermost select defines the upper bound of each group. That upper boundary defines a group.
- The outer select defines the lower bound of each group.
Every row in the table should fall into one of the groups so defined, and any given group may well consist of a single date/time value.

[edited: the upper bound is the lowest date/time value where the interval is more than 3 seconds]
```
select dtFrom = min( t.dtFrom ) ,
       dtThru =      t.dtThru
from ( select dtFrom = t1.dtLogged ,
              dtThru = min( t2.dtLogged )
       from      dbo.EventLog t1
       left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
                                and datediff(second,t1.dtLogged,t2.dtLogged) > 3
       group by t1.dtLogged
     ) t
group by t.dtThru
```
You could then pull rows from the event log and tag them with the group to which they belong thus:
```
select *
from ( select dtFrom = min( t.dtFrom ) ,
              dtThru =      t.dtThru
       from ( select dtFrom = t1.dtLogged ,
                     dtThru = min( t2.dtLogged )
              from      dbo.EventLog t1
              left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
                                       and datediff(second,t1.dtLogged,t2.dtLogged) > 3
              group by t1.dtLogged
            ) t
       group by t.dtThru
     ) period
join dbo.EventLog t on t.dtLogged >=           period.dtFrom
                   and t.dtLogged <= coalesce( period.dtThru , t.dtLogged )
order by period.dtFrom , period.dtThru , t.dtLogged
```
Each row is tagged with its group via the dtFrom and dtThru columns returned. You could get fancy and assign an integral row number to each group if you want.
0 讨论(0)
发布评论:

提交评论
- 加载中...