How do I find records that have the same value in adjacent records in Sql Server? (I believe the correct term for this is a region??)

99封情书 提交于 2019-12-06 11:45:26

EDIT:

Upon review, none of my original work on this answer was very good. This actually belongs to the class of problems known as , and this revised answer will use information I've gleaned from similar questions/learned since first answering this question.

It turns out this query can be done a lot more simply than I originally thought:

WITH Grouped_Run AS (SELECT heartRate, dateTime,
                            ROW_NUMBER() OVER(ORDER BY dateTime) -  
                            ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime) AS groupingId
                     FROM HeartRate)

SELECT heartRate, MIN(dateTime), MAX(dateTime)
FROM Grouped_Run
GROUP BY heartRate, groupingId
HAVING COUNT(*) > 2

SQL Fiddle Demo


So what's happening here? One of the definitions of gaps-and-islands problems is the need for "groups" of consecutive values (or lack thereof). Often sequences are generated to solve this, exploiting an often overlooked/too-intuitive fact: subtracting sequences yields a constant value.

For example, imagine the following sequences, and the subtraction (the values in the rows are unimportant):

position   positionInGroup  subtraction
=========================================
1          1                0
2          2                0
3          3                0
4          1                3
5          2                3
6          1                5
7          4                3
8          5                3

position is a simple sequence generated over all records.
positionInGroup is a simple sequence generated for each set of different records. In this case, there's actually 3 different sets of records (starting at position = 1, 4, 6).
subtraction is the result of the difference between the other two columns. Note that values may repeat for different groups!
One of the key properties the sequences must share is they must be generated over the rows of data in the same order, or this breaks.

So how is SQL doing this? Through the use of ROW_NUMBER() this function will generate a sequence of numbers over a "window" of records:

ROW_NUMBER() OVER(ORDER BY dateTime)

will generate the position sequence.

ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime)

will generate the positionInGroup sequence, with each heartRate being a different group.
In the case of most queries of this type, the values of the two sequences is unimportant, it's the subtraction (to get the sequence group) that matters, so we just need the result of the subtraction.
We'll also need the heartRate and the times in which they occurred to provide the answer.

The original answer asked for the start and end times of each of the "runs" of stuck heartbeats. That's a standard MIN(...)/MAX(...), which means a GROUP BY. We need to use both the original heartRate column (because that's a non-aggregate column) and our generated groupingId (which identifies the current "run" per stuck value).

Part of the question asked for only runs that repeated three or more times. The HAVING COUNT(*) > 2 is an instruction to ignore runs of length 2 or less; it counts rows per-group.

I recommend Ben-Gan's article on interval packing, which applies to your adjacency problem.

tsql-challenge-packing-date-and-time-intervals

solutions-to-packing-date-and-time-intervals-puzzle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!