MySQL GROUP BY DateTime +/- 3 seconds

后端 未结 5 1210
再見小時候
再見小時候 2020-12-03 05:34

Suppose I have a table with 3 columns:

  • id (PK, int)
  • timestamp (datetime)
  • title (text)

I have the following records:

         


        
相关标签:
5条回答
  • 2020-12-03 05:58

    Warning: Long answer. This should work, and is fairly neat, except for one step in the middle where you have to be willing to run an INSERT statement over and over until it doesn't do anything since we can't do recursive CTE things in MySQL.

    I'm going to use this data as the example instead of yours:

    id    Timestamp
    1     1:00:00
    2     1:00:03
    3     1:00:06
    4     1:00:10
    

    Here is the first query to write:

    SELECT a.id as aid, b.id as bid
    FROM Table a
    JOIN Table b 
    ON (a.Timestamp is within 3 seconds of b.Timestamp)
    

    It returns:

    aid     bid
    1       1
    1       2
    2       1
    2       2
    2       3
    3       2
    3       3
    4       4
    

    Let's create a nice table to hold those things that won't allow duplicates:

    CREATE TABLE
    Adjacency
    ( aid INT(11)
    , bid INT(11)
    , PRIMARY KEY (aid, bid) --important for later
    )
    

    Now the challenge is to find something like the transitive closure of that relation.

    To do so, let's find the next level of links. by that I mean, since we have 1 2 and 2 3 in the Adjacency table, we should add 1 3:

    INSERT IGNORE INTO Adjacency(aid,bid)
    SELECT adj1.aid, adj2.bid
    FROM Adjacency adj1
    JOIN Adjacency adj2
    ON (adj1.bid = adj2.aid)
    

    This is the non-elegant part: You'll need to run the above INSERT statement over and over until it doesn't add any rows to the table. I don't know if there is a neat way to do that.

    Once this is over, you will have a transitively-closed relation like this:

    aid     bid
    1       1
    1       2
    1       3     --added
    2       1
    2       2
    2       3
    3       1     --added
    3       2
    3       3
    4       4
    

    And now for the punchline:

    SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
    FROM Adjacency
    GROUP BY aid
    

    returns:

    aid     Neighbors
    1       1,2,3
    2       1,2,3
    3       1,2,3
    4       4
    

    So

    SELECT DISTINCT Neighbors
    FROM (
         SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
         FROM Adjacency
         GROUP BY aid
         ) Groupings
    

    returns

    Neighbors
    1,2,3
    4
    

    Whew!

    0 讨论(0)
  • 2020-12-03 05:58

    Simple query:

    SELECT * FROM time_history GROUP BY ROUND(UNIX_TIMESTAMP(time_stamp)/3);
    
    0 讨论(0)
  • 2020-12-03 05:59

    I'm using Tom H.'s excellent idea but doing it a little differently here:

    Instead of finding all the rows that are the beginnings of chains, we can find all times that are the beginnings of chains, then go back and ifnd the rows that match the times.

    Query #1 here should tell you which times are the beginnings of chains by finding which times do not have any times below them but within 3 seconds:

    SELECT DISTINCT Timestamp
    FROM Table a
    LEFT JOIN Table b
    ON (b.Timestamp >= a.TimeStamp - INTERVAL 3 SECONDS
        AND b.Timestamp < a.Timestamp)
    WHERE b.Timestamp IS NULL
    

    And then for each row, we can find the largest chain-starting timestamp that is less than our timestamp with Query #2:

    SELECT Table.id, MAX(StartOfChains.TimeStamp) AS ChainStartTime
    FROM Table
    JOIN ([query #1]) StartofChains
    ON Table.Timestamp >= StartOfChains.TimeStamp
    GROUP BY Table.id
    

    Once we have that, we can GROUP BY it as you wanted.

    SELECT COUNT(*) --or whatever
    FROM Table
    JOIN ([query #2]) GroupingQuery
    ON Table.id = GroupingQuery.id
    GROUP BY GroupingQuery.ChainStartTime
    

    I'm not entirely sure this is distinct enough from Tom H's answer to be posted separately, but it sounded like you were having trouble with implementation, and I was thinking about it, so I thought I'd post again. Good luck!

    0 讨论(0)
  • 2020-12-03 06:06

    Now that I think that I understand your problem, based on your comment response to OMG Ponies, I think that I have a set-based solution. The idea is to first find the start of any chains based on the title. The start of a chain is going to be defined as any row where there is no match within three seconds prior to that row:

    SELECT
        MT1.my_id,
        MT1.title,
        MT1.my_time
    FROM
        My_Table MT1
    LEFT OUTER JOIN My_Table MT2 ON
        MT2.title = MT1.title AND
        (
            MT2.my_time < MT1.my_time OR
            (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
        ) AND
        MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
    WHERE
        MT2.my_id IS NULL
    

    Now we can assume that any non-chain starters belong to the chain starter that appeared before them. Since MySQL doesn't support CTEs, you might want to throw the above results into a temporary table, as that would save you the multiple joins to the same subquery below.

    SELECT
        SQ1.my_id,
        COUNT(*)  -- You didn't say what you were trying to calculate, just that you needed to group them
    FROM
    (
        SELECT
            MT1.my_id,
            MT1.title,
            MT1.my_time
        FROM
            My_Table MT1
        LEFT OUTER JOIN My_Table MT2 ON
            MT2.title = MT1.title AND
            (
                MT2.my_time < MT1.my_time OR
                (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
            ) AND
            MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
        WHERE
            MT2.my_id IS NULL
    ) SQ1
    INNER JOIN My_Table MT3 ON
        MT3.title = SQ1.title AND
        MT3.my_time >= SQ1.my_time
    LEFT OUTER JOIN
    (
        SELECT
            MT1.my_id,
            MT1.title,
            MT1.my_time
        FROM
            My_Table MT1
        LEFT OUTER JOIN My_Table MT2 ON
            MT2.title = MT1.title AND
            (
                MT2.my_time < MT1.my_time OR
                (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
            ) AND
            MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
        WHERE
            MT2.my_id IS NULL
    ) SQ2 ON
        SQ2.title = SQ1.title AND
        SQ2.my_time > SQ1.my_time AND
        SQ2.my_time <= MT3.my_time
    WHERE
        SQ2.my_id IS NULL
    

    This would look much simpler if you could use CTEs or if you used a temporary table. Using the temporary table might also help performance.

    Also, there will be issues with this if you can have timestamps that match exactly. If that's the case then you will need to tweak the query slightly to use a combination of the id and the timestamp to distinguish rows with matching timestamp values.

    EDIT: Changed the queries to handle exact matches by timestamp.

    0 讨论(0)
  • 2020-12-03 06:06

    I like @Chris Cunningham's answer, but here's another take on it.

    First, my understanding of your problem statement (correct me if I'm wrong):

    You want to look at your event log as a sequence, ordered by the time of the event, and partitition it into groups, defining the boundary as being an interval of more than 3 seconds between two adjacent rows in the sequence.

    I work mostly in SQL Server, so I'm using SQL Server syntax. It shouldn't be too difficult to translate into MySQL SQL.

    So, first our event log table:

    --
    -- our event log table
    --
    create table dbo.eventLog
    (
      id       int          not null ,
      dtLogged datetime     not null ,
      title    varchar(200) not null ,
    
      primary key nonclustered ( id ) ,
      unique clustered ( dtLogged , id ) ,
    
    )
    

    Given the above understanding of the problem statement, the following query should give you the upper and lower bounds your groups. It's a simple, nested select statement with 2 group by to collapse things:

    • The innermost select defines the upper bound of each group. That upper boundary defines a group.
    • The outer select defines the lower bound of each group.

    Every row in the table should fall into one of the groups so defined, and any given group may well consist of a single date/time value.

    [edited: the upper bound is the lowest date/time value where the interval is more than 3 seconds]

    select dtFrom = min( t.dtFrom ) ,
           dtThru =      t.dtThru
    from ( select dtFrom = t1.dtLogged ,
                  dtThru = min( t2.dtLogged )
           from      dbo.EventLog t1
           left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
                                    and datediff(second,t1.dtLogged,t2.dtLogged) > 3
           group by t1.dtLogged
         ) t
    group by t.dtThru
    

    You could then pull rows from the event log and tag them with the group to which they belong thus:

    select *
    from ( select dtFrom = min( t.dtFrom ) ,
                  dtThru =      t.dtThru
           from ( select dtFrom = t1.dtLogged ,
                         dtThru = min( t2.dtLogged )
                  from      dbo.EventLog t1
                  left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
                                           and datediff(second,t1.dtLogged,t2.dtLogged) > 3
                  group by t1.dtLogged
                ) t
           group by t.dtThru
         ) period
    join dbo.EventLog t on t.dtLogged >=           period.dtFrom
                       and t.dtLogged <= coalesce( period.dtThru , t.dtLogged )
    order by period.dtFrom , period.dtThru , t.dtLogged
    

    Each row is tagged with its group via the dtFrom and dtThru columns returned. You could get fancy and assign an integral row number to each group if you want.

    0 讨论(0)
提交回复
热议问题