BigQuery apply rank / percent_rank to column with a WHERE clause

喜欢而已 提交于 2021-01-29 13:48:04

问题


I have a fairly wide bigquery table with ~20-30 different columns, each of which needs to receive a complementary percentile column, that shows the column's percentile value compared to all other rows in the table. However, each of the columns should only receive a percentile value if the value in another column meets a certain threshold. To showcase this, I created a reproducible example below:

WITH
  correct_games_played AS
    (
      SELECT "a" as name, 7 as num1, 0.4 as num2, 0.55 as num3
      UNION ALL SELECT "b" as name, 13 as num1, 0.53 as num2, 0.37 as num3
      UNION ALL SELECT "c" as name, 4 as num1, 0.42 as num2, 0.32 as num3
      UNION ALL SELECT "d" as name, 17 as num1, 0.6 as num2, 0.23 as num3
      UNION ALL SELECT "e" as name, 7 as num1, 0.3 as num2, 0.25 as num3
      UNION ALL SELECT "f" as name, 16 as num1, 0.7 as num2, 0.43 as num3
      UNION ALL SELECT "g" as name, 10 as num1, 0.53 as num2, 0.52 as num3
      UNION ALL SELECT "h" as name, 5 as num1, 0.54 as num2, 0.21 as num3
      UNION ALL SELECT "i" as name, 9 as num1, 0.56 as num2, 0.17 as num3
      UNION ALL SELECT "j" as name, 3 as num1, 0.75 as num2, 0.53 as num3
    )

  SELECT 
    a.*,
    -- RANK() OVER(ORDER BY a.num1 DESC) AS num1_rank,
    -- RANK() OVER(ORDER BY a.num2 DESC) AS num2_rank,
    -- RANK() OVER(ORDER BY a.num3 DESC) AS num3_rank
    RANK() OVER(ORDER BY a.num1 DESC) AS num1_rank,
    RANK() OVER(ORDER BY a.num2 WHERE a.num1 > 4 DESC) AS num2_rank
    RANK() OVER(ORDER BY a.num3 WHERE a.num1 > 3 DESC) AS num3_rank
  FROM correct_games_played as a

This script throws the error Syntax error: Expected ")" but got keyword WHERE at [22:37], however this works if i replace the rank() with the commented out rank(). My objective is really quite simple:

  • num2_rank: only rank values in a.num2 if a.num1 is greater than 4, otherwise display a null value
  • num3_rank: only rank values in a.num3 if a.num1 is greater than 3, otherwise display a null value

My table is quite wide, and there's a chance that each column will require its own condition to determine if each columns' row's values should be ranked or not. Any help with this would be greatly appreciated, thanks!


回答1:


Below is for BigQuery Standard SQL

#standardSQL
WITH correct_games_played AS (
  SELECT "a" AS name, 7 AS num1, 0.4 AS num2, 0.55 AS num3 UNION ALL 
  SELECT "b" AS name, 13 AS num1, 0.53 AS num2, 0.37 AS num3 UNION ALL 
  SELECT "c" AS name, 4 AS num1, 0.42 AS num2, 0.32 AS num3 UNION ALL 
  SELECT "d" AS name, 17 AS num1, 0.6 AS num2, 0.23 AS num3 UNION ALL 
  SELECT "e" AS name, 7 AS num1, 0.3 AS num2, 0.25 AS num3 UNION ALL 
  SELECT "f" AS name, 16 AS num1, 0.7 AS num2, 0.43 AS num3 UNION ALL 
  SELECT "g" AS name, 10 AS num1, 0.53 AS num2, 0.52 AS num3 UNION ALL 
  SELECT "h" AS name, 5 AS num1, 0.54 AS num2, 0.21 AS num3 UNION ALL 
  SELECT "i" AS name, 9 AS num1, 0.56 AS num2, 0.17 AS num3 UNION ALL 
  SELECT "j" AS name, 3 AS num1, 0.75 AS num2, 0.53 AS num3
)
SELECT *,
  RANK() OVER(ORDER BY num1 DESC) AS num1_rank,
  IF(num1 > 4, RANK() OVER(ORDER BY IF(num1 > 4, num2, NULL) DESC), NULL)  AS num2_rank,
  IF(num1 > 3, RANK() OVER(ORDER BY IF(num1 > 3, num3, NULL) DESC), NULL) AS num3_rank
FROM correct_games_played


来源:https://stackoverflow.com/questions/58794322/bigquery-apply-rank-percent-rank-to-column-with-a-where-clause

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!