How to combine two rows and calculate the time difference between two timestamp values in MySQL?

前端未结

关注

 6  1011

I have a situation that I\'m sure is quite common and it\'s really bothering me that I can\'t figure out how to do it or what to search for to find a relevant example/soluti

相关标签:

6条回答

长发绾君心

2020-12-06 06:39

Try this.

select start.name, start.ts start, end.ts end, timediff(end.ts, start.ts) duration from (
    select *, (
        select id from log L2 where L2.ts>L1.ts and L2.name=L1.name order by ts limit 1
    ) stop_id from log L1
) start join log end on end.id=start.stop_id
where start.eventtype='start' and end.eventtype='stop';

0 讨论(0)

既然无缘

2020-12-06 06:42
Can you change the data collector? If yes, add a group_id field (with an index) into the log table and write the id of the start event into it (same id for start and end in the group_id). Then you can do
```
SELECT S.id, S.name, TIMEDIFF(E.ts, S.ts) `diff`
FROM `log` S
    JOIN `log` E ON S.id = E.group_id AND E.eventtype = 'end'
WHERE S.eventtype = 'start'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉酒成梦

2020-12-06 06:49

If you don't mind creating a temporary table*, then I think the following should work well. I have tested it with 120,000 records, and the whole process completes in under 6 seconds. With 1,048,576 records it completed in just under 66 seconds - and that's on an old Pentium III with 128MB RAM:

*In MySQL 5.0 (and perhaps other versions) the temporary table cannot be a true MySQL temporary table, as you cannot refer to a TEMPORARY table more than once in the same query. See here:

http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html

Instead, just drop/create a normal table, as follows:

DROP TABLE IF EXISTS `tmp_log`;
CREATE TABLE `tmp_log` (
    `id` INT NOT NULL,
    `row` INT NOT NULL,
    `name` VARCHAR(16),
    `ts` DATETIME NOT NULL,
    `eventtype` VARCHAR(25),
    INDEX `row` (`row` ASC),
    INDEX `eventtype` (`eventtype` ASC)
);

This table is used to store a sorted and numbered list of rows from the following SELECT query:

INSERT INTO `tmp_log` (
    `id`,
    `row`,
    `name`,
    `ts`,
    `eventtype`
)
SELECT
    `id`,
    @row:=@row+1,
    `name`,
    `ts`,
    `eventtype`
FROM log,
(SELECT @row:=0) row_count
ORDER BY `name`, `id`;

The above SELECT query sorts the rows by name and then id (you could use the timestamp instead of the id, just so long as the start events appear before the stop events). Each row is also numbered. By doing this, matching pairs of events are always next to each other, and the row number of the start event is always one less than the row id of the stop event.

Now select the matching pairs from the list:

SELECT
    start_log.row AS start_row,
    stop_log.row AS stop_row,
    start_log.name AS name,
    start_log.eventtype AS start_event,
    start_log.ts AS start_time,
    stop_log.eventtype AS stop_event,
    stop_log.ts AS end_time,
    TIMEDIFF(stop_log.ts, start_log.ts) AS duration
FROM
    tmp_log AS start_log
INNER JOIN tmp_log AS stop_log
    ON start_log.row+1 = stop_log.row
    AND start_log.name = stop_log.name
    AND start_log.eventtype = 'start'
    AND stop_log.eventtype = 'stop'
ORDER BY start_log.id;

Once you're done, it's probably a good idea to drop the temporary table:

DROP TABLE IF EXISTS `tmp_log`;row

UPDATE

You could try the following idea, which eliminates temp tables and joins altogether by using variables to store values from the previous row. It sorts the rows by name then time stamp, which groups all values with the same name together, and puts each group in time order. I think that this should ensure that all corresponding start/stop events are next to each other.

SELECT id, name, start, stop, TIMEDIFF(stop, start) AS duration FROM (
    SELECT
        id, ts, eventtype,
        (@name <> name) AS new_name,
        @start AS start,
        @start := IF(eventtype = 'start', ts, NULL) AS prev_start,
        @stop  := IF(eventtype = 'stop',  ts, NULL) AS stop,
        @name  := name AS name
    FROM table1 ORDER BY name, ts
) AS tmp, (SELECT @start:=NULL, @stop:=NULL, @name:=NULL) AS vars
WHERE new_name = 0 AND start IS NOT NULL AND stop IS NOT NULL;

I don't know how it will compare to Ivar Bonsaksen's method, but it runs fairly fast on my box.

Here's how I created the test data:

CREATE TABLE  `table1` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` VARCHAR(5),
    `ts` DATETIME,
    `eventtype` VARCHAR(5),
    PRIMARY KEY (`id`),
    INDEX `name` (`name`),
    INDEX `ts` (`ts`)
) ENGINE=InnoDB;

DELIMITER //
DROP PROCEDURE IF EXISTS autofill//
CREATE PROCEDURE autofill()
BEGIN
    DECLARE i INT DEFAULT 0;
    WHILE i < 1000000 DO
        INSERT INTO table1 (name, ts, eventtype) VALUES (
            CHAR(FLOOR(65 + RAND() * 26)),
            DATE_ADD(NOW(),
            INTERVAL FLOOR(RAND() * 365) DAY),
            IF(RAND() >= 0.5, 'start', 'stop')
        );
        SET i = i + 1;
    END WHILE;
END;
//
DELIMITER ;

CALL autofill();

0 讨论(0)

攒了一身酷

2020-12-06 06:52

I believe this could be a simpler way to reach your goal:

SELECT
    start_log.name,
    MAX(start_log.ts) AS start_time,
    end_log.ts AS end_time,
    TIMEDIFF(MAX(start_log.ts), end_log.ts)
FROM
    log AS start_log
INNER JOIN
    log AS end_log ON (
            start_log.name = end_log.name
        AND
            end_log.ts > start_log.ts)
WHERE start_log.eventtype = 'start'
AND end_log.eventtype = 'stop'
GROUP BY start_log.name

It should run considerably faster as it eliminates one subquery.

0 讨论(0)

梦谈多话

2020-12-06 06:52

I got it working by combining both your solutions, but the query isn't very effective and I'd think there would be a smarter way to omit those unwanted rows.

What I've got now is:

SELECT y.name, 
       y.start, 
       y.stop, 
       TIMEDIFF(y.stop, y.start) 
  FROM (SELECT l.name, 
               MAX(x.ts) AS start, 
               l.ts AS stop 
          FROM log l 
          JOIN (SELECT t.name, 
                       t.ts 
                  FROM log t 
                 WHERE t.eventtype = 'start') x ON x.name = l.name 
                       AND x.ts < l.ts 
         WHERE l.eventtype = 'stop' 
      GROUP BY l.name, l.ts) y 
WHERE NOT EXISTS (SELECT 1 
                    FROM log AS d 
                   WHERE d.ts > y.start AND d.ts < y.stop AND d.name = y.name 
                         AND d.eventtype = 'stop')

Limited to a given 'name' the query goes from about 0.5 seconds to about 14 seconds when I include the WHERE NOT EXISTS clause... The table will become quite large and I'm worried about how many hours this will take for all names in the end. I currently only have data for June 2010 in the table (10 days) and it's now at 109888 rows.

0 讨论(0)

一个人的身影

2020-12-06 06:59
How about this:
```
SELECT start_log.ts AS start_time, end_log.ts AS end_time
FROM log AS start_log
INNER JOIN log AS end_log ON (start_log.name = end_log.name AND end_log.ts > start_log.ts)
WHERE NOT EXISTS (SELECT 1 FROM log WHERE log.ts > start_log.ts AND log.ts < end_log.ts)
 AND start_log.eventtype = 'start'
 AND end_log.eventtype = 'stop'
```
This will find each pair of rows (aliased as start_log and end_log) with no events in between, where the first is always a start and the last is always a stop. Since we disallow intermediate events, a start that's not immediately followed by a stop will naturally be excluded.
0 讨论(0)
发布评论:

提交评论
- 加载中...