问题
consider the following data:
category | index | value
-------------------------
cat 1 | 1 | 2
cat 1 | 2 | 3
cat 1 | 3 |
cat 1 | 4 | 1
cat 2 | 1 | 5
cat 2 | 2 |
cat 2 | 3 |
cat 2 | 4 | 6
cat 3 | 1 |
cat 3 | 2 |
cat 3 | 3 | 2
cat 3 | 4 | 1
I am trying to fill in the holes, so that hole = avg(value)
of 2 nearest neighbours with non-null values within a category:
category | index | value
-------------------------
cat 1 | 1 | 2
cat 1 | 2 | 3
cat 1 | 3 | 2*
cat 1 | 4 | 1
cat 2 | 1 | 5
cat 2 | 2 | 5.5*
cat 2 | 3 | 5.5*
cat 2 | 4 | 6
cat 3 | 1 | 1.5*
cat 3 | 2 | 1.5*
cat 3 | 3 | 2
cat 3 | 4 | 1
I've been playing with window functions and am pretty sure it can be achieved but the solution is eluding me.
Any ideas?
回答1:
You are correct, window function is what you're looking for. Here's how it can be done (with
part is used to define table, so you probably won't need it):
with dt as
(
select * from
(
values
('cat 1', 1, 2),
('cat 1', 2, 3),
('cat 1', 3, null),
('cat 1', 4, 1),
('cat 2', 1, 5),
('cat 2', 2, null),
('cat 2', 3, null),
('cat 2', 4, 6),
('cat 3', 1, null),
('cat 3', 2, null),
('cat 3', 3, 1),
('cat 3', 4, 2)
) tbl ("category", "index", "value")
)
select
"category",
"index",
case
when "value" is null then (avg("value") over (partition by "category") )
else "value"
end
from dt
order by "category", "index";
refer to WINDOW Clause
section of this page for further info on window functions.
回答2:
I was working on a solution for you, but SQLfiddle is giving (internal) errors at the moment, so I can't complete it.
A statement like this should do the update for you:
update table1 as t1
set value =
(select avg(value)
from
(select value
from table1 as t3
where t1.category = t3.category
and t3.index in (t1.index - 1, t1.index + 1)
) AS T2
)
where value is null
;
The fiddle I was working on is here: http://sqlfiddle.com/#!15/acbc2/1
回答3:
While I am sure its possible to make some hideously complicated and nested statement that does what you want, I wanted to say that, sometimes, its better to write a script in a regular programming language such as python/ruby/java that iterates over the DB table and makes whatever changes you want.
This will be a great deal more maintainable and you want have to rearchitect the whole thing every time you need to make any change to it (such as using 3 nearest neighbors instead, or changing the definition of 'nearest neighbor')
来源:https://stackoverflow.com/questions/31738642/select-nearest-neighbours