How to use a ring data structure in window functions

半城伤御伤魂 提交于 2019-12-17 14:01:28

问题


I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring:

I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5.

Here is an example table of two rings with different numbers of parts (always more than three per ring):

create table rp (ring int, part int);
insert into rp(ring, part) values(1, generate_series(1, 5));
insert into rp(ring, part) values(2, generate_series(1, 7));

Here is a nearly successful query:

SELECT ring, part, array[
    lag(part, 1, NULL) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

 ring | part | neighbours
------+------+------------
    1 |    1 | {NULL,1,2}
    1 |    2 | {1,2,3}
    1 |    3 | {2,3,4}
    1 |    4 | {3,4,5}
    1 |    5 | {4,5,1}
    2 |    1 | {NULL,1,2}
    2 |    2 | {1,2,3}
    2 |    3 | {2,3,4}
    2 |    4 | {3,4,5}
    2 |    5 | {4,5,6}
    2 |    6 | {5,6,7}
    2 |    7 | {6,7,1}
(12 rows)

The only thing I need to do is to replace the NULL with the ending point of each ring, which is the last value. Now, along with lag and lead window functions, there is a last_value function which would be ideal. However, these cannot be nested:

SELECT ring, part, array[
    lag(part, 1, last_value(part) over (partition by ring)) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;
ERROR:  window function calls cannot be nested
LINE 2:     lag(part, 1, last_value(part) over (partition by ring)) ...

Update. Thanks to @Justin's suggestion to use coalesce to avoid nesting window functions. Furthermore, it has been pointed out by numerous folks that first/last values need an explicit order by on the ring sequence, which happens to be part for this example. So randomising the input data a bit:

create table rp (ring int, part int);
insert into rp(ring, part) select 1, generate_series(1, 5) order by random();
insert into rp(ring, part) select 2, generate_series(1, 7) order by random();

回答1:


  • Use COALESCE like @Justin provided.
  • With first_value() / last_value() you need to add an ORDER BY clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.
    Once you add ORDER BY, the default window frame ends at the current row, and you need to special case the last_value() call - or revert the sort order in the window frame like demonstrated in my first example.

  • When reusing a window definition multiple times, an explicit WINDOW clause simplifies syntax a lot:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part);

Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER w)
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring
             ORDER BY part
             RANGE BETWEEN UNBOUNDED PRECEDING
                       AND UNBOUNDED FOLLOWING)
ORDER  BY 1,2;

You can even adapt the frame definition for each window function call:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
                                                AND UNBOUNDED FOLLOWING))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER  BY 1,2;

Might be faster for rings with many parts. You'll have to test.

SQL Fiddle demonstrating all three with an improved test case. Consider query plans.

More about window frame definitions:

  • In the manual.
  • PostgreSQL window function: partition by comparison
  • PostgreSQL query with max and min date plus associated id per row



回答2:


Query:

SQLFIDDLEExample

SELECT ring, part, array[
    coalesce(lag(part, 1, NULL) over (partition by ring), 
             max(part) over (partition by ring)),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

Result:

| RING | PART | NEIGHBOURS |
|------|------|------------|
|    1 |    1 |      5,1,2 |
|    1 |    2 |      1,2,3 |
|    1 |    3 |      2,3,4 |
|    1 |    4 |      3,4,5 |
|    1 |    5 |      4,5,1 |
|    2 |    1 |      7,1,2 |
|    2 |    2 |      1,2,3 |
|    2 |    3 |      2,3,4 |
|    2 |    4 |      3,4,5 |
|    2 |    5 |      4,5,6 |
|    2 |    6 |      5,6,7 |
|    2 |    7 |      6,7,1 |


来源:https://stackoverflow.com/questions/25030179/how-to-use-a-ring-data-structure-in-window-functions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!