Unexpected behavior of window function first_value

☆樱花仙子☆ 提交于 2019-12-11 15:41:28

问题


I have 2 columns - order no, value. Table value constructor:

(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)

I need to get

(1, 5) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(2, 5)
,(3, 2) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(4, 2) -- analogous
,(5, 2)
,(6, 1)

This is query that I think should work.

;with SourceTable as (
    select *
        from (values
            (1, null)
            ,(2, 5)
            ,(3, null)
            ,(4, null)
            ,(5, 2)
            ,(6, 1)
        ) as T(OrderNo, Value)
)
select
       *
       ,first_value(Value) over (
           order by
               case when Value is not null then 0 else 1 end
               , OrderNo
           rows between current row and unbounded following
       ) as X
   from SourceTable
order by OrderNo

The issue is that it returns exactly same resultset as SourceTable. I don't understand why. E.g., if first row is processed (OrderNo = 1) I'd expect column X returns 5 because frame should include all rows (current row and unbound following) and it orders by Value - nonnulls first, then by OrderNo. So first row in frame should be OrderNo=2. Obviously it doesn't work like that but I don't get why.

Much appreciated if someone explains how is constructed the first frame. I need this for SQL Server and also Postgresql.

Many thanks


回答1:


It's pretty easy to see why first_value doesn't work if you order the results by case when Value is not null then 0 else 1 end, orderno

 orderno | value | x
---------+-------+---
       2 |     5 | 5
       5 |     2 | 2
       6 |     1 | 1
       1 |       |
       3 |       |
       4 |       |
(6 rows)

For orderno=1, there's nothing after it in the frame that would be not-null.

Instead, we can arrange the orders into groups using count as a window function in a sub-query. We then use max as a window function over that group (this is arbitrary, min would work just as well) to get the one non-null value in that group:

with SourceTable as (
    select *
        from (values
            (1, null)
            ,(2, 5)
            ,(3, null)
            ,(4, null)
            ,(5, 2)
            ,(6, 1)
        ) as T(OrderNo, Value)
)
select orderno, order_group, max(value) OVER (PARTITION BY order_group) FROM (

    SELECT *,
       count(value) OVER (ORDER BY orderno DESC) as order_group
   from SourceTable
   ) as sub
order by orderno;
 orderno | order_group | max
---------+-------------+-----
       1 |           3 |   5
       2 |           3 |   5
       3 |           2 |   2
       4 |           2 |   2
       5 |           2 |   2
       6 |           1 |   1
(6 rows)



回答2:


Although probably more expensive than two window functions, you can do this without a subquery using arrays:

with SourceTable as (
      select *
      from (values (1, null),
                   (2, 5),
                   (3, null),
                   (4, null),
                   (5, 2),
                   (6, 1)
           ) T(OrderNo, Value)
)
select st.*,
       (array_remove(array_agg(value) over (order by orderno rows between current row and unbounded following), null))[1] as x
from SourceTable st
order by OrderNo;

Here is the db<>fiddle.

Or using a lateral join:

select st.*, st2.value
from SourceTable st left join lateral
     (select st2.*
      from SourceTable st2
      where st2.value is not null and st2.orderno >= st.orderno
      order by st2.orderno asc
      limit 1
     ) st2
     on 1=1
order by OrderNo;

With the right indexes on the source table, the lateral join might be the best solution from a performance perspective (I have been surprised by the performance of lateral joins under the right circumstances).



来源:https://stackoverflow.com/questions/58418463/unexpected-behavior-of-window-function-first-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!