Postgres - how to return rows with 0 count for missing data?

后端 未结 3 691
孤街浪徒
孤街浪徒 2020-12-05 03:05

I have unevenly distributed data(wrt date) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported

相关标签:
3条回答
  • 2020-12-05 03:51

    You can create the list of all first days of the last year (say) with

    select distinct date_trunc('month', (current_date - offs)) as date 
    from generate_series(0,365,28) as offs;
              date
    ------------------------
     2007-12-01 00:00:00+01
     2008-01-01 00:00:00+01
     2008-02-01 00:00:00+01
     2008-03-01 00:00:00+01
     2008-04-01 00:00:00+02
     2008-05-01 00:00:00+02
     2008-06-01 00:00:00+02
     2008-07-01 00:00:00+02
     2008-08-01 00:00:00+02
     2008-09-01 00:00:00+02
     2008-10-01 00:00:00+02
     2008-11-01 00:00:00+01
     2008-12-01 00:00:00+01
    

    Then you can join with that series.

    0 讨论(0)
  • 2020-12-05 03:58

    You could create a temporary table at runtime and left join on that. That seems to make the most sense.

    0 讨论(0)
  • 2020-12-05 03:59

    This question is old. But since fellow users picked it as master for a new duplicate I am adding a proper answer.

    Proper solution

    SELECT *
    FROM  (
       SELECT day::date
       FROM   generate_series(timestamp '2007-12-01'
                            , timestamp '2008-12-01'
                            , interval  '1 month') day
       ) d
    LEFT   JOIN (
       SELECT date_trunc('month', date_col)::date AS day
            , count(*) AS some_count
       FROM   tbl
       WHERE  date_col >= date '2007-12-01'
       AND    date_col <= date '2008-12-06'
    -- AND    ... more conditions
       GROUP  BY 1
       ) t USING (day)
    ORDER  BY day;
    
    • Use LEFT JOIN, of course.

    • generate_series() can produce a table of timestamps on the fly, and very fast.

    • It's generally faster to aggregate before you join. I recently provided a test case on sqlfiddle.com in this related answer:

      • PostgreSQL - order by an array
    • Cast the timestamp to date (::date) for a basic format. For more use to_char().

    • GROUP BY 1 is syntax shorthand to reference the first output column. Could be GROUP BY day as well, but that might conflict with an existing column of the same name. Or GROUP BY date_trunc('month', date_col)::date but that's too long for my taste.

    • Works with the available interval arguments for date_trunc().

    • count() never produces NULL (0 for no rows), but the LEFT JOIN does.
      To return 0 instead of NULL in the outer SELECT, use COALESCE(some_count, 0) AS some_count. The manual.

    • For a more generic solution or arbitrary time intervals consider this closely related answer:

      • Best way to count records by arbitrary time intervals in Rails+Postgres
    0 讨论(0)
提交回复
热议问题