Mysql workaround for window functions

前端 未结 4 992
南方客
南方客 2020-12-19 06:52

I have an event table that has the following fields:

event_id
event_type 
event_time

Given a duration D and a number k<

相关标签:
4条回答
  • 2020-12-19 06:58

    Edit: Rearranged whole answer

    Now I understand what you expect.

    I've created such a test table on my MySQL and this seems to work:

    SELECT e2.event_type FROM events e1
    JOIN events e2 
        ON e1.event_time BETWEEN e2.event_time AND (e2.event_time + INTERVAL 10 MINUTE);
    GROUP BY e1.event_id, e2.event_type
    HAVING count(e2.event_type) >= 5
    

    Basically, for each event you self join events with specified relative time window (from event_time to event_time + window duration), and then you group by e1's even_id to get emulated floating time window. Also we're gruping by event_type here because you want to get this field values for each window.

    All you need to think through is performance. I'm not sure if it will be efficient enough for a 1M of records.

    0 讨论(0)
  • 2020-12-19 07:04

    MySQL has no window function support, but you can use a correlated subqueries in the SELECT list to retrieve exactly one column:

    SELECT
      event_id,
      event_type, 
      event_time,
      (SELECT COUNT(*) FROM events EC WHERE EC.event_type = E.event_type AND EC.event_time > E.event_time) AS subsequent_event_count
    FROM
      events E
    WHERE ...
    

    Do EXPLAIN it. This is kinda the same in terms of execution logic as the CROSS APPLY in SQL Server.

    Another approach is a self join:

    SELECT
      E.event_id,
      E.event_type,
      E.event_time,
      COUNT(EC.event_id) AS subsequent_event_count
    FROM
      events E
      LEFT JOIN events EC
        ON E.event_type = EC.event_type AND E.event_type < EC.event_type
    GROUP BY
      E.event_id,
      E.event_type,
      E.event_time
    

    Do test both approaches for performance.

    You can do much more creative joins, like

    EC.event_time > E.event_time AND EC.event_time < E.event_time + INTERVAL 1 DAY
    
    0 讨论(0)
  • 2020-12-19 07:11

    Do CTE's work fast enough?

    WITH etypes_in_range AS (
    SELECT tn.event_type,
           count(1) AS num
      FROM tablename tn
     WHERE tn.event_time < time_interval_end
       AND tn.event_time > time_interval_start
     GROUP BY tn.event_type
     HAVING count(1) > 5)
    SELECT count(1)
      FROM etypes_in_range
    
    0 讨论(0)
  • 2020-12-19 07:16

    Notice that this lack of functionality is a thing of the past with MySQL 8 and later: https://dev.mysql.com/doc/refman/8.0/en/window-functions.html

    0 讨论(0)
提交回复
热议问题