Compute percentiles by group in BigQuery

99封情书 提交于 2021-01-29 05:11:08

问题


After searching around, I could not find a solution on this. With the following example:

with
  my_data as (
    select 1 as num, 'a' as letter union all
    select 2 as num, 'a' as letter union all
    select 3 as num, 'a' as letter union all
    select 4 as num, 'a' as letter union all
    select 5 as num, 'a' as letter union all
    select 6 as num, 'b' as letter union all
    select 7 as num, 'b' as letter union all
    select 8 as num, 'b' as letter union all
    select 9 as num, 'b' as letter union all
    select 10 as num, 'b' as letter
  )

select
  letter,
  approx_quantiles(num, 100) as value
from my_data
group by letter

We are looking to compute 0 - 100 quantiles for the num column, grouped by letter. The current query only returns 2 rows, as the value column seems like an array. What we need is for the above query to return 202 rows, structure as such:

letter value pctile
     a     1      0
     a     1      1
     a     1      2
     a     1      3
     a     1      4
...
     b     1      0
     b     1      1
     b     1      2
     b     1      3
     b     1      4

...where the pctile column is 0 - 100, and the value column is the value associated with the percentile in the pctile column. This isn't the best example because we are trying to compote 0 - 100 percentiles, and the example data only has 10 rows, however I think it is sufficient to reflect the problem.


回答1:


Below is for BigQuery Standard SQL

#standardSQL
SELECT letter, value, pctile
FROM (
  SELECT
    letter,
    APPROX_QUANTILES(num, 100) AS value
  FROM my_data
  GROUP BY letter
) t, t.value WITH OFFSET AS pctile


来源:https://stackoverflow.com/questions/64218911/compute-percentiles-by-group-in-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!