PostgreSQL: detecting the first/last rows of result set

爷,独闯天下 提交于 2019-12-13 08:19:21

问题


Is there any way to embed a flag in a select that indicates that it is the first or the last row of a result set? I'm thinking something to the effect of:

> SELECT is_first_row() AS f, is_last_row() AS l FROM blah;
  f  |  l
-----------
  t  |  f
  f  |  f
  f  |  f
  f  |  f
  f  |  t

The answer might be in window functions but I've only just learned about them, and I question their efficiency.

SELECT first_value(unique_column) OVER () = unique_column, last_value(unique_column) OVER () = unique_column, * FROM blah;

seems to do what I want. Unfortunately, I don't even fully understand that syntax, but since unique_column is unique and NOT NULL it should deliver unambiguous results. But if it does sorting, then the cure might be worse than the disease. (Actually, in my tests, unique_column is not sorted, so that's something.)

EXPLAIN ANALYZE doesn't indicate there's an efficiency problem, but when has it ever told me what I needed to know?

And I might need to use this in an aggregate function, but I've just been told window functions aren't allowed there. 😕

Edit: Actually, I just added ORDER BY unique_column to the above query and the rows identified as first and last were thrown into the middle of the result set. It's as if first_value()/last_value() really means "the first/last value I picked up before I began sorting." I don't think I can safely do this optimally. Not unless a much better understanding of the use of the OVER keyword is to be had.

I'm running PostgreSQL 9.6 in a Debian 9.5 environment.

This isn't a duplicate, because I'm trying to get the first row and last row of the result set to identify themselves, while Postgres: get min, max, aggregate values in one select is just going for the minimum and maximum values for a column in a result set.


回答1:


You can use the lead() and lag() window functions (over the appropiate window) and compare them to NULL:


-- \i tmp.sql

CREATE TABLE ztable
( id SERIAL PRIMARY KEY
  , starttime TIMESTAMP
);

INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '1 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '2 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '3 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '4 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '5 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '6 minute');

SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY id )
ORDER BY id
        ;


SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY id
        ;

SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY random()
        ;

Result:


INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  1 | 2018-08-31 18:38:45.567393 | f        | t
  2 | 2018-08-31 18:37:45.575586 | f        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  5 | 2018-08-31 18:34:45.600619 | f        | f
  6 | 2018-08-31 18:33:45.60907  | t        | f
(6 rows)

 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  1 | 2018-08-31 18:38:45.567393 | t        | f
  2 | 2018-08-31 18:37:45.575586 | f        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  5 | 2018-08-31 18:34:45.600619 | f        | f
  6 | 2018-08-31 18:33:45.60907  | f        | t
(6 rows)

 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  2 | 2018-08-31 18:37:45.575586 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  6 | 2018-08-31 18:33:45.60907  | f        | t
  5 | 2018-08-31 18:34:45.600619 | f        | f
  1 | 2018-08-31 18:38:45.567393 | t        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
(6 rows)

[updated: added a randomly sorted case]




回答2:


It is simple using window functions with particular frames:

with t(x, y) as (select generate_series(1,5), random()) 
select *,
  count(*) over (rows between unbounded preceding and current row),
  count(*) over (rows between current row and unbounded following)
from t;
┌───┬───────────────────┬───────┬───────┐
│ x │         y         │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.543995119165629 │     1 │     5 │
│ 2 │ 0.886343683116138 │     2 │     4 │
│ 3 │ 0.124682310037315 │     3 │     3 │
│ 4 │ 0.668972567655146 │     4 │     2 │
│ 5 │ 0.266671542543918 │     5 │     1 │
└───┴───────────────────┴───────┴───────┘

As you can see count(*) over (rows between unbounded preceding and current row) returns rows count from the data set beginning to current row and count(*) over (rows between current row and unbounded following) returns rows count from the current to data set end. 1 indicates the first/last rows.

It works until you ordering your data set by order by. In this case you need to duplicate it in the frames definitions:

with t(x, y) as (select generate_series(1,5), random()) 
select *,
  count(*) over (order by y rows between unbounded preceding and current row),
  count(*) over (order by y rows between current row and unbounded following)
from t order by y;
┌───┬───────────────────┬───────┬───────┐
│ x │         y         │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.125781774986535 │     1 │     5 │
│ 4 │  0.25046408502385 │     2 │     4 │
│ 5 │ 0.538880597334355 │     3 │     3 │
│ 3 │ 0.802807193249464 │     4 │     2 │
│ 2 │ 0.869908029679209 │     5 │     1 │
└───┴───────────────────┴───────┴───────┘

PS: As mentioned by a_horse_with_no_name in the comment:

there is no such thing as the "first" or "last" row without sorting.




回答3:


In fact, Window Functions are a great approach and for that requirement of yours, they are awesome.

Regarding efficiency, window functions work over the data set already at hand. Which means the DBMS will just add extra processing to infer first/last values.

Just one thing I'd like to suggest: I like to put an ORDER BY criteria inside the OVER clause, just to ensure the data set order is the same between multiple executions, thus returning the same values to you.




回答4:


Try using

SELECT columns 
FROM mytable 
Join conditions
WHERE conditions ORDER BY date DESC LIMIT 1

UNION ALL 

SELECT columns
FROM mytable 
Join conditions
WHERE conditions ORDER BY date ASC LIMIT 1

SELECT just cut half of the processing time. You can go for indexing also.



来源:https://stackoverflow.com/questions/52118140/postgresql-detecting-the-first-last-rows-of-result-set

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!