Facing Issue in understanding BigQuery Stored Procedure used in pivoting

瘦欲@ 提交于 2020-07-10 10:26:52

问题


I am trying to do pivoting using BigQuery Stored Procedure like explained in this link

Input Table:

Output Required:

First part of stored procedure is to generate list of all values which are to be used to generate new columns like below :

EXECUTE IMMEDIATE (
    "SELECT STRING_AGG(' "||aggregation
    ||"""(IF('||@pivot_col_name||'="'||x.value||'", '||@pivot_col_value||', null)) '||x.value)
   FROM UNNEST((
       SELECT APPROX_TOP_COUNT("""||pivot_col_name||", @max_columns) FROM `"||table_name||"`)) x"
  ) INTO header_pivot 
  USING pivot_col_name AS pivot_col_name, pivot_col_value AS pivot_col_value, max_columns AS max_columns;

This query is generating output :

MAX(IF(EXTENDED_PROPERTY_KEY="key1", EXTENDED_PROPERTY_VALUE, null)) key1, 
MAX(IF(EXTENDED_PROPERTY_KEY="key2", EXTENDED_PROPERTY_VALUE, null)) key2,
MAX(IF(EXTENDED_PROPERTY_KEY="key3", EXTENDED_PROPERTY_VALUE, null)) key3, 
MAX(IF(EXTENDED_PROPERTY_KEY="key4", EXTENDED_PROPERTY_VALUE, null)) key4

The second part :

SELECT STRING_AGG(''||(i+1)) FROM UNNEST(row_ids) WITH OFFSET i

Here I am facing difficulty as offset value is not increasing by 1 for each row_ids[Account_id]

The approach did work for me though with some manipulation but I wanted to understand the underlying query which generates the desired output . I tried to break it down and reached the final query as

SELECT (SELECT STRING_AGG(x) FROM UNNEST([row_ids]) x) # this part extracts individual row_ids ,
       (SELECT STRING_AGG(DISTINCT "MAX(IF(pivot_key= '" || pivot_key|| "', pivot_value, NULL)) AS " || pivot_key)
        FROM `project.dataset.table`
          ) # generates string for each key of pivot_key column
FROM `project.dataset.table`
GROUP BY (SELECT STRING_AGG(''||(i+1)) FROM UNNEST([row_ids]) WITH OFFSET i) # generates offset for each row_id , though in this case I see it is always 1
ORDER BY (SELECT STRING_AGG(''||(i+1)) FROM UNNEST([row_ids]) WITH OFFSET i)

But when I try to run the above concatenated query it fails at

a) SELECT STRING_AGG(x) FROM UNNEST([row_ids]) x) saying that STRING_AGG() should have String values , in my case row_id is integer , so for that I manipulated it as SELECT STRING_AGG('' || x) FROM UNNEST([row_ids]) x)

b) After fixing a) it again fails saying UNNEST expression references column row_id which is neither grouped nor aggregated

Additinally :

Row_Id == Account_Id

Pivot_key == Extended_Property_Key

Pivot_value == Extended_Property_Value

Please explain what is the missing portion here ?


回答1:


The final query executed in your case probably is:

SELECT
  account_id,
  MAX(
  IF
    (extended_property_key="Key 3",
      extended_property_value,
      NULL)) e_Key3,
  MAX(
  IF
    (extended_property_key="Key 2",
      extended_property_value,
      NULL)) e_Key2,
  MAX(
  IF
    (extended_property_key="Key 1",
      extended_property_value,
      NULL)) e_Key1
FROM
  `your-table`
GROUP BY
  1
ORDER BY
  1

Keep in mind that this just a static query created by that script. Pivoting tables programmatically using normal SQL is not possible in BigQuery



来源:https://stackoverflow.com/questions/62659762/facing-issue-in-understanding-bigquery-stored-procedure-used-in-pivoting

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!