Total Number of Records per Week

萝らか妹 提交于 2021-02-08 08:16:01

问题


I have a Postgres 9.1 database. I am trying to generate the number of records per week (for a given date range) and compare it to the previous year.

I have the following code used to generate the series:

select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series

However, I am not sure how to join the counted records to the dates generated.

So, using the following records as an example:

Pt_ID      exam_date
======     =========
1          2012-01-02
2          2012-01-02
3          2012-01-08
4          2012-01-08
1          2013-01-02
2          2013-01-02
3          2013-01-03
4          2013-01-04
1          2013-01-08
2          2013-01-10
3          2013-01-15
4          2013-01-24

I wanted to have the records return as:

  series        thisyr      lastyr
===========     =====       =====
2013-01-01        4           2
2013-01-08        3           2
2013-01-15        1           0
2013-01-22        1           0
2013-01-29        0           0

Not sure how to reference the date range in the subsearch. Thanks for any assistance.


回答1:


Using across joinshould work, I'm just going to paste the markdown output from SQL Fiddle below. It would seem that your sample output is incorrect for series 2013-01-08: the thisyr should be 2, not 3. This might not be the best way to do this though, my Postgresql knowledge leaves a lot to be desired.

SQL Fiddle

PostgreSQL 9.2.4 Schema Setup:

CREATE TABLE Table1
    ("Pt_ID" varchar(6), "exam_date" date);

INSERT INTO Table1
    ("Pt_ID", "exam_date")
VALUES
    ('1', '2012-01-02'),('2', '2012-01-02'),
    ('3', '2012-01-08'),('4', '2012-01-08'),
    ('1', '2013-01-02'),('2', '2013-01-02'),
    ('3', '2013-01-03'),('4', '2013-01-04'),
    ('1', '2013-01-08'),('2', '2013-01-10'),
    ('3', '2013-01-15'),('4', '2013-01-24');

Query 1:

select 
  series, 
  sum (
    case 
      when exam_date 
        between series and series + '6 day'::interval
      then 1 
      else 0 
    end
  ) as thisyr,
  sum (
    case 
      when exam_date + '1 year'::interval 
        between series and series + '6 day'::interval
      then 1 else 0 
    end
  ) as lastyr

from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series

Results:

|                         SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 |      4 |      2 |
| January, 08 2013 00:00:00+0000 |      2 |      2 |
| January, 15 2013 00:00:00+0000 |      1 |      0 |
| January, 22 2013 00:00:00+0000 |      1 |      0 |
| January, 29 2013 00:00:00+0000 |      0 |      0 |



回答2:


The simple approach would be to solve this with a CROSS JOIN like demonstrated by @jpw. However, there are some hidden problems:

  1. The performance of an unconditional CROSS JOIN deteriorates quickly with growing number of rows. The total number of rows is multiplied by the number of weeks you are testing for, before this huge derived table can be processed in the aggregation. Indexes can't help.

  2. Starting weeks with January 1st leads to inconsistencies. ISO weeks might be an alternative. See below.

All of the following queries make heavy use of an index on exam_date. Be sure to have one.

Only join to relevant rows

Should be much faster:

SELECT d.day, d.thisyr
     , count(t.exam_date) AS lastyr
FROM  (
   SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0  -- for 2nd join
        , count(t.exam_date) AS thisyr
   FROM   generate_series('2013-01-01'::date
                        , '2013-01-31'::date  -- last week overlaps with Feb.
                        , '7 days'::interval) d(day)  -- returns timestamp
   LEFT   JOIN tbl t ON t.exam_date >= d.day::date
                    AND t.exam_date <  d.day::date + 7
   GROUP  BY d.day
   ) d
LEFT   JOIN tbl t ON t.exam_date >= d.day0      -- repeat with last year
                 AND t.exam_date <  d.day0 + 7
GROUP  BY d.day, d.thisyr
ORDER  BY d.day;

This is with weeks starting from Jan. 1st like in your original. As commented, this produces a couple of inconsistencies: Weeks start on a different day each year and since we cut off at the end of the year, the last week of the year consists of just 1 or 2 days (leap year).

The same with ISO weeks

Depending on requirements, consider ISO weeks instead, which start on Mondays and always span 7 days. But they cross the border between years. Per documentation on EXTRACT():

week

The number of the week of the year that the day is in. By definition (ISO 8601), weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.

In the ISO definition, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is part of the first week of 2013. It's recommended to use the isoyear field together with week to get consistent results.

Above query rewritten with ISO weeks:

SELECT w AS isoweek
     , day::text  AS thisyr_monday, thisyr_ct
     , day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct
FROM  (
   SELECT w, day
        , date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0
        , count(t.exam_date) AS thisyr_ct
   FROM  (
      SELECT w
           , date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day
      FROM   generate_series(0, 4) w
      ) d
   LEFT   JOIN tbl t ON t.exam_date >= d.day
                    AND t.exam_date <  d.day + 7
   GROUP  BY d.w, d.day
   ) d
LEFT   JOIN tbl t ON t.exam_date >= d.day0     -- repeat with last year
                 AND t.exam_date <  d.day0 + 7
GROUP  BY d.w, d.day, d.day0, d.thisyr_ct
ORDER  BY d.w, d.day;

January 4th is always in the first ISO week of the year. So this expression gets the date of Monday of the first ISO week of the given year:

date_trunc('week', '2012-01-04'::date)::date

Simplify with EXTRACT()

Since ISO weeks coincide with the week numbers returned by EXTRACT(), we can simplify the query. First, a short and simple form:

SELECT w AS isoweek
     , COALESCE(thisyr_ct, 0) AS thisyr_ct
     , COALESCE(lastyr_ct, 0) AS lastyr_ct
FROM   generate_series(1, 5) w
LEFT   JOIN (
   SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct
   FROM   tbl
   WHERE  EXTRACT(isoyear FROM exam_date)::int = 2013
   GROUP  BY 1
   ) t13  USING (w)
LEFT   JOIN (
   SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct
   FROM   tbl
   WHERE  EXTRACT(isoyear FROM exam_date)::int = 2012
   GROUP  BY 1
   ) t12  USING (w);

Optimized query

The same with more details and optimized for performance

WITH params AS (          -- enter parameters here, once 
   SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
        , date_trunc('week', '2013-01-04'::date)::date AS this_start
        , date_trunc('week', '2014-01-04'::date)::date AS next_start
        , 1 AS week_1
        , 5 AS week_n     -- show weeks 1 - 5
   )
SELECT w.w AS isoweek
     , p.this_start + 7 * (w - 1) AS thisyr_monday
     , COALESCE(t13.ct, 0) AS thisyr_ct
     , p.last_start + 7 * (w - 1) AS lastyr_monday
     , COALESCE(t12.ct, 0) AS lastyr_ct
FROM params p
   , generate_series(p.week_1, p.week_n) w(w)
LEFT   JOIN (
   SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
   FROM   tbl t, params p
   WHERE  t.exam_date >= p.this_start      -- only relevant dates
   AND    t.exam_date <  p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND    t.exam_date <  p.next_start      -- don't cross over into next year
   GROUP  BY 1
   ) t13  USING (w)
LEFT   JOIN (                              -- same for last year
   SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
   FROM   tbl t, params p
   WHERE  t.exam_date >= p.last_start
   AND    t.exam_date <  p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND    t.exam_date <  p.this_start
   GROUP  BY 1
   ) t12  USING (w);

This should be very fast with index support and can easily be adapted to intervals of choice. The implicit JOIN LATERAL for generate_series() in the last query requires Postgres 9.3.

SQL Fiddle.



来源:https://stackoverflow.com/questions/26834062/total-number-of-records-per-week

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!