Unexpected behavior in FIRST_VALUE() with IGNORE NULLS (Vertica)

安稳与你 提交于 2020-06-16 02:50:29

问题


I'm seeing unexpected behavior in Vertica's FIRST_VALUE() analytic function with the IGNORE NULLS parameter. It appears to return NULL when it shouldn't.

The issue occurs in this very tiny table:

drop table if exists temp;
create table temp (time_ timestamp(6), name varchar(10));
insert into temp (time_) values ('2016-03-18 20:32:16.144');
insert into temp (time_, name) values ('2016-03-18 20:52:09.062', 'abc');

Here are the contents of the table (select * from temp):

time_                   | name
------------------------+--------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc

Here is the query I'm running:

select time_,
  first_value(name ignore nulls) over (order by time_) first_name
from temp;

Here are the results this query returns:

time_                   | first_name
------------------------+------------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc

Here are the results I would expect (and desire) from this query:

time_                   | first_name
------------------------+------------
2016-03-18 20:32:16.144 | abc
2016-03-18 20:52:09.062 | abc

Does the above query have a very fundamental syntax mistake? This issue occurs on Vertica Community Edition 7.1.1.


回答1:


The function works as expected.
over (order by time_) is a shortcut for over (order by time_ range unbounded preceding) which is a shortcut for over (order by time_ range between unbounded preceding and current row), which means every row sees only the rows that preceded it, including itself.
The first row sees only itself therefore there isn't a non NULL value in its scope.

If you want the first non NULL value of the whole scope, you have to specify the whole scope:

first_value(name ignore nulls) over 
    (order by time_ range between unbounded preceding and unbounded following) first_name

No, this is definitly not a bug.

You've probably have been using syntax like sum(x) over (order by y) for running totals and the default window of RANGE UNBOUNDED PRECEDING seemed very natural to you.
Since you had not define an explicit window for the FIRST_VALUE function, you have been using the same default window.

Here is another test case:

ts val
-- ----
1  NULL
2  X
3  NULL
4  Y
5  NULL

What would you expect to get from the following function?

last_value (val) order (by ts)

What would you expect to get from the following function?

last_value (val ignore nulls) order (by ts)



回答2:


This is where my thinking takes me

select time_
      ,first_value(name) over (order by case when name is null then 1 else 0 end,time_) FirstName
from temp A
order by time_

Returns

time_               FirstName
20:32:16.1440000    abc
20:52:09.0620000    abc


来源:https://stackoverflow.com/questions/40662817/unexpected-behavior-in-first-value-with-ignore-nulls-vertica

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!