Efficient time series querying in Postgres

前端 未结 4 1820
逝去的感伤
逝去的感伤 2020-12-30 13:05

I have a table in my PG db that looks somewhat like this:

id | widget_id | for_date | score |

Each referenced widget has a lot of these ite

4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-30 13:43

    First of all, you can have a much simpler generate_series() table expression. Equivalent to yours (except for descending order, that contradicts the rest of your question anyways):

    SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
    

    The type date is coerced to timestamptz automatically on input. The return type is timestamptz either way. I use a subquery below, so I can cast to the output to date right away.

    Next, max() as window function returns exactly what you need: the highest value since frame start ignoring NULL values. Building on that, you get a radically simple query.

    For a given widget_id

    Most likely faster than involving CROSS JOIN or WITH RECURSIVE:

    SELECT a.day, s.*
    FROM  (
       SELECT d.day
             ,max(s.for_date) OVER (ORDER BY d.day) AS effective_date
       FROM  (
          SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
          ) d(day)
       LEFT   JOIN score s ON s.for_date = d.day
                          AND s.widget_id = 1337 -- "for a given widget_id"
       ) a
    LEFT   JOIN score s ON s.for_date = a.effective_date
                       AND s.widget_id = 1337
    ORDER  BY a.day;
    

    ->sqlfiddle

    With this query you can put any column from score you like into the final SELECT list. I put s.* for simplicity. Pick your columns.

    If you want to start your output with the first day that actually has a score, simply replace the last LEFT JOIN with JOIN.

    Generic form for all widget_id's

    Here I use a CROSS JOIN to produce a row for every widget on every date ..

    SELECT a.day, a.widget_id, s.score
    FROM  (
       SELECT d.day, w.widget_id
             ,max(s.for_date) OVER (PARTITION BY w.widget_id
                                    ORDER BY d.day) AS effective_date
       FROM  (SELECT generate_series('2012-05-05'::date
                                    ,'2012-05-15'::date, '1d')::date AS day) d
       CROSS  JOIN (SELECT DISTINCT widget_id FROM score) AS w
       LEFT   JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id
       ) a
    JOIN  score s ON s.for_date = a.effective_date
                 AND s.widget_id = a.widget_id  -- instead of LEFT JOIN
    ORDER BY a.day, a.widget_id;
    

    ->sqlfiddle

提交回复
热议问题