Fill with zeros in a frequency of dates group by month and year in big query

こ雲淡風輕ζ 提交于 2021-02-11 13:58:54

问题


I have a table with hiredate (Date) ,First Name (String) and Sur Name (string) like this:

hireDate    First Name      Surname
13-oct-14   Cintia Roxana   Padilla Julca
28-oct-14   Conor           McAteer
28-oct-14   Paolo           Mesia Macher
28-oct-14   William Anthony Whelan
15-nov-14   Peter Michael   Coates
13-feb-15   Natalie         Conche
15-mar-15   Beatriz         Vargas Huanca
01-may-15   Walter          Calle Chenccnes
04-may-15   Sarah Louise    Price

And I made a view of a frequency of hire_dates(DATE) and the cumulative frequency in the other column like this:

Row hireDate    Count       Cumulative
1   13/10/2014  1           1
2   28/10/2014  3           4
3   15/11/2014  1           5
4   13/02/2015  1           6
5   15/03/2015  1           7
6   09/04/2015  1           8
7   15/04/2015  1           9
8   01/05/2015  1           10

And the query goes like this:

WITH
Data AS (
 SELECT
 hireDate,
 COUNT(1) AS Count
 FROM
 `human-resources-221122.human_resources.employees_view`
 WHERE
 status <> "cancelled"
 GROUP BY
 1 )

SELECT
hireDate,
Count,
SUM(Count) OVER (ORDER BY hireDate ASC ROWS BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW) AS Cumulative
FROM
Data
ORDER BY
hireDate ASC

But I need to see the numbers by month and year with zeros in those places that there are no count, something like this:

Hire_Month  Hire_Year   Count   Cumulative
October     2014        4       4
November    2014        1       5
December    2014        0       5
January     2015        0       5
February    2015        1       6
March       2015        1       7
April       2015        2       9
May         2015        1       10

Thanks in advance.


回答1:


Note the use of GENERATE_DATE_ARRAY and RIGHT JOIN to get the desired results:

WITH data AS (
  SELECT * 
  FROM UNNEST ([
    STRUCT(DATE("2014-12-03") AS d, 4 AS a)
    , STRUCT("2015-01-05", 7)
    , STRUCT("2015-03-05", 1)
  ])
), all_months AS (
   SELECT month
   FROM UNNEST(GENERATE_DATE_ARRAY(
     (SELECT DATE_TRUNC(MIN(d), MONTH) FROM data)
     , (SELECT MAX(d) FROM data)
     , INTERVAL 1 MONTH)
   ) AS month
)


SELECT month, IFNULL(SUM(a),0) a, SUM(SUM(a)) OVER(ORDER BY month) a_cum
FROM data 
RIGHT JOIN all_months
ON DATE_TRUNC(d, MONTH)=month
GROUP BY month
ORDER BY month

Now, if we are just counting, you can use the fact that a LEFT/RIGHT JOIN will have nulls for empty months. This is how the query could be adapted to an arbitrary table (Wikipedia here):

WITH data AS (
  SELECT *, DATE(datehour) d
  FROM `fh-bigquery.wikipedia_v3.pageviews_2018` 
  WHERE wiki='pt'
    AND (datehour BETWEEN '2018-09-30' AND '2018-09-30'
      OR datehour BETWEEN '2018-12-01' AND '2018-12-02'
    )
    AND title LIKE 'Calif%'
), all_months AS (
   SELECT month
   FROM UNNEST(GENERATE_DATE_ARRAY(
     (SELECT DATE_TRUNC(MIN(d), MONTH) FROM data)
     , (SELECT MAX(d) FROM data)
     , INTERVAL 1 MONTH)
   ) AS month
)


SELECT month, COUNT(d) c, SUM(COUNT(d)) OVER(ORDER BY month) a_cum
FROM data 
RIGHT JOIN all_months
ON DATE_TRUNC(d, MONTH)=month
GROUP BY month
ORDER BY month



来源:https://stackoverflow.com/questions/54693016/fill-with-zeros-in-a-frequency-of-dates-group-by-month-and-year-in-big-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!