问题
I'm seeing unexpected behavior in Vertica's FIRST_VALUE() analytic function with the IGNORE NULLS parameter. It appears to return NULL when it shouldn't.
The issue occurs in this very tiny table:
drop table if exists temp;
create table temp (time_ timestamp(6), name varchar(10));
insert into temp (time_) values ('2016-03-18 20:32:16.144');
insert into temp (time_, name) values ('2016-03-18 20:52:09.062', 'abc');
Here are the contents of the table (select * from temp):
time_ | name
------------------------+--------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc
Here is the query I'm running:
select time_,
first_value(name ignore nulls) over (order by time_) first_name
from temp;
Here are the results this query returns:
time_ | first_name
------------------------+------------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc
Here are the results I would expect (and desire) from this query:
time_ | first_name
------------------------+------------
2016-03-18 20:32:16.144 | abc
2016-03-18 20:52:09.062 | abc
Does the above query have a very fundamental syntax mistake? This issue occurs on Vertica Community Edition 7.1.1.
回答1:
The function works as expected.over (order by time_)
is a shortcut for over (order by time_ range unbounded preceding)
which is a shortcut for over (order by time_ range between unbounded preceding and current row)
, which means every row sees only the rows that preceded it, including itself.
The first row sees only itself therefore there isn't a non NULL value in its scope.
If you want the first non NULL value of the whole scope, you have to specify the whole scope:
first_value(name ignore nulls) over
(order by time_ range between unbounded preceding and unbounded following) first_name
No, this is definitly not a bug.
You've probably have been using syntax like sum(x) over (order by y)
for running totals and the default window of RANGE UNBOUNDED PRECEDING seemed very natural to you.
Since you had not define an explicit window for the FIRST_VALUE function, you have been using the same default window.
Here is another test case:
ts val
-- ----
1 NULL
2 X
3 NULL
4 Y
5 NULL
What would you expect to get from the following function?
last_value (val) order (by ts)
What would you expect to get from the following function?
last_value (val ignore nulls) order (by ts)
回答2:
This is where my thinking takes me
select time_
,first_value(name) over (order by case when name is null then 1 else 0 end,time_) FirstName
from temp A
order by time_
Returns
time_ FirstName
20:32:16.1440000 abc
20:52:09.0620000 abc
来源:https://stackoverflow.com/questions/40662817/unexpected-behavior-in-first-value-with-ignore-nulls-vertica