select nearest neighbours

问题

consider the following data:

category | index | value
-------------------------
cat 1    | 1     | 2
cat 1    | 2     | 3
cat 1    | 3     |  
cat 1    | 4     | 1
cat 2    | 1     | 5
cat 2    | 2     |  
cat 2    | 3     |  
cat 2    | 4     | 6
cat 3    | 1     |  
cat 3    | 2     |  
cat 3    | 3     | 2 
cat 3    | 4     | 1

I am trying to fill in the holes, so that hole = avg(value) of 2 nearest neighbours with non-null values within a category:

category | index | value
-------------------------
cat 1    | 1     | 2
cat 1    | 2     | 3
cat 1    | 3     | 2*
cat 1    | 4     | 1
cat 2    | 1     | 5
cat 2    | 2     | 5.5*
cat 2    | 3     | 5.5* 
cat 2    | 4     | 6
cat 3    | 1     | 1.5*
cat 3    | 2     | 1.5* 
cat 3    | 3     | 2 
cat 3    | 4     | 1

I've been playing with window functions and am pretty sure it can be achieved but the solution is eluding me.

Any ideas?

回答1:

You are correct, window function is what you're looking for. Here's how it can be done (with part is used to define table, so you probably won't need it):

with dt as
(
    select * from
    (
        values
            ('cat 1', 1, 2),
            ('cat 1', 2, 3),
            ('cat 1', 3, null),
            ('cat 1', 4, 1),
            ('cat 2', 1, 5),
            ('cat 2', 2, null),
            ('cat 2', 3, null),
            ('cat 2', 4, 6),
            ('cat 3', 1, null),
            ('cat 3', 2, null),
            ('cat 3', 3, 1),
            ('cat 3', 4, 2)

    ) tbl ("category", "index", "value")
)
select
        "category",
        "index",
        case
            when "value" is null then (avg("value") over (partition by "category") )
            else "value"
        end
    from dt
    order by "category", "index";

refer to WINDOW Clause section of this page for further info on window functions.

回答2:

I was working on a solution for you, but SQLfiddle is giving (internal) errors at the moment, so I can't complete it.

A statement like this should do the update for you:

update table1 as t1
set value = 
  (select avg(value)
   from 
   (select value
    from table1 as t3
    where t1.category = t3.category
    and   t3.index in (t1.index - 1, t1.index + 1)
    ) AS T2
   )
where value is null
;

The fiddle I was working on is here: http://sqlfiddle.com/#!15/acbc2/1

回答3:

While I am sure its possible to make some hideously complicated and nested statement that does what you want, I wanted to say that, sometimes, its better to write a script in a regular programming language such as python/ruby/java that iterates over the DB table and makes whatever changes you want.

This will be a great deal more maintainable and you want have to rearchitect the whole thing every time you need to make any change to it (such as using 3 nearest neighbors instead, or changing the definition of 'nearest neighbor')

来源：https://stackoverflow.com/questions/31738642/select-nearest-neighbours

标签

sql

postgresql

window-functions