How to combine two rows and calculate the time difference between two timestamp values in MySQL?

前端 未结 6 1020
误落风尘
误落风尘 2020-12-06 06:14

I have a situation that I\'m sure is quite common and it\'s really bothering me that I can\'t figure out how to do it or what to search for to find a relevant example/soluti

6条回答
  •  醉酒成梦
    2020-12-06 06:49

    If you don't mind creating a temporary table*, then I think the following should work well. I have tested it with 120,000 records, and the whole process completes in under 6 seconds. With 1,048,576 records it completed in just under 66 seconds - and that's on an old Pentium III with 128MB RAM:

    *In MySQL 5.0 (and perhaps other versions) the temporary table cannot be a true MySQL temporary table, as you cannot refer to a TEMPORARY table more than once in the same query. See here:

    http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html

    Instead, just drop/create a normal table, as follows:

    DROP TABLE IF EXISTS `tmp_log`;
    CREATE TABLE `tmp_log` (
        `id` INT NOT NULL,
        `row` INT NOT NULL,
        `name` VARCHAR(16),
        `ts` DATETIME NOT NULL,
        `eventtype` VARCHAR(25),
        INDEX `row` (`row` ASC),
        INDEX `eventtype` (`eventtype` ASC)
    );
    

    This table is used to store a sorted and numbered list of rows from the following SELECT query:

    INSERT INTO `tmp_log` (
        `id`,
        `row`,
        `name`,
        `ts`,
        `eventtype`
    )
    SELECT
        `id`,
        @row:=@row+1,
        `name`,
        `ts`,
        `eventtype`
    FROM log,
    (SELECT @row:=0) row_count
    ORDER BY `name`, `id`;
    

    The above SELECT query sorts the rows by name and then id (you could use the timestamp instead of the id, just so long as the start events appear before the stop events). Each row is also numbered. By doing this, matching pairs of events are always next to each other, and the row number of the start event is always one less than the row id of the stop event.

    Now select the matching pairs from the list:

    SELECT
        start_log.row AS start_row,
        stop_log.row AS stop_row,
        start_log.name AS name,
        start_log.eventtype AS start_event,
        start_log.ts AS start_time,
        stop_log.eventtype AS stop_event,
        stop_log.ts AS end_time,
        TIMEDIFF(stop_log.ts, start_log.ts) AS duration
    FROM
        tmp_log AS start_log
    INNER JOIN tmp_log AS stop_log
        ON start_log.row+1 = stop_log.row
        AND start_log.name = stop_log.name
        AND start_log.eventtype = 'start'
        AND stop_log.eventtype = 'stop'
    ORDER BY start_log.id;
    

    Once you're done, it's probably a good idea to drop the temporary table:

    DROP TABLE IF EXISTS `tmp_log`;row
    

    UPDATE

    You could try the following idea, which eliminates temp tables and joins altogether by using variables to store values from the previous row. It sorts the rows by name then time stamp, which groups all values with the same name together, and puts each group in time order. I think that this should ensure that all corresponding start/stop events are next to each other.

    SELECT id, name, start, stop, TIMEDIFF(stop, start) AS duration FROM (
        SELECT
            id, ts, eventtype,
            (@name <> name) AS new_name,
            @start AS start,
            @start := IF(eventtype = 'start', ts, NULL) AS prev_start,
            @stop  := IF(eventtype = 'stop',  ts, NULL) AS stop,
            @name  := name AS name
        FROM table1 ORDER BY name, ts
    ) AS tmp, (SELECT @start:=NULL, @stop:=NULL, @name:=NULL) AS vars
    WHERE new_name = 0 AND start IS NOT NULL AND stop IS NOT NULL;
    

    I don't know how it will compare to Ivar Bonsaksen's method, but it runs fairly fast on my box.

    Here's how I created the test data:

    CREATE TABLE  `table1` (
        `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
        `name` VARCHAR(5),
        `ts` DATETIME,
        `eventtype` VARCHAR(5),
        PRIMARY KEY (`id`),
        INDEX `name` (`name`),
        INDEX `ts` (`ts`)
    ) ENGINE=InnoDB;
    
    DELIMITER //
    DROP PROCEDURE IF EXISTS autofill//
    CREATE PROCEDURE autofill()
    BEGIN
        DECLARE i INT DEFAULT 0;
        WHILE i < 1000000 DO
            INSERT INTO table1 (name, ts, eventtype) VALUES (
                CHAR(FLOOR(65 + RAND() * 26)),
                DATE_ADD(NOW(),
                INTERVAL FLOOR(RAND() * 365) DAY),
                IF(RAND() >= 0.5, 'start', 'stop')
            );
            SET i = i + 1;
        END WHILE;
    END;
    //
    DELIMITER ;
    
    CALL autofill();
    

提交回复
热议问题