BigQuery reshape table with structs from wide to long

这一生的挚爱 提交于 2020-01-24 21:55:06

问题


Deleted the previous version of this post in lieu of this cleaned up posting with a reproducible example. I have a table of the following format:

WITH wide_stats AS (
  (
    SELECT 
      'joe' name, 'bills' team,
      struct(struct(7 as fga) as o, struct(8 as fga) as d) as t1,
      struct(struct(3 as fga) as o, struct(9 as fga) as d) as t2,
      struct(3 as pts, 9 as ast, 5 as reb) as t3,    
      7 tov, 3 blk
  ) UNION ALL (
    SELECT 'nick' name, 'jets' team,
      struct(struct(12 as fga) as o, struct(13 as fga) as d) as t1,
      struct(struct(15 as fga) as o, struct(22 as fga) as d) as t2,
      struct(13 as pts, 5 as ast, 15 as reb) as t3,    
      75 tov, 23 blk
  )
)

SELECT 
  name, team, metric, SAFE_CAST(value AS FLOAT64) value
FROM (
  SELECT 
    name, team, 
    REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric, 
    REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
  FROM wide_stats,
  UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(wide_stats), r'{|}', ''))) pair
)

WHERE NOT LOWER(metric) IN ('name', 'team')

and I am working to reshape the table into the following output:

name   team       metric   value    
joe   bills     t1_o_fga       7 
joe   bills     t1_d_fga       8
joe   bills     t2_o_fga       3
joe   bills     t2_d_fga       9
joe   bills       t3_pts       3
joe   bills       t3_ast       9
joe   bills       t3_reb       5
joe   bills          tov       7
joe   bills          blk       3
nick   jets     t1_o_fga      12
nick   jets     t1_d_fga      13
nick   jets     t2_o_fga      15
nick   jets     t2_d_fga      22
nick   jets       t3_pts      13
nick   jets       t3_ast       5
nick   jets       t3_reb      15
nick   jets          tov      75
nick   jets          blk      23

The task is simple to explain - from wide to long, but with struct and nested structs in the table. My regex effort from another stackoveflow post is splitting up the column names in the wrong way, and the current output doesn't match how it needs to be.

Order of rows doesn't matter. With the names, doesn't matter if its t1_o_fga or t1-o-fga or t1/o/fga, so long as there's some separator and its clear what the variable is. Any help or direction is super appreciated, thanks!


回答1:


Below is for BigQuery Standard SQL

#standardSQL
WITH wide_stats AS (
    SELECT 'joe' name, 'bills' team,
      STRUCT(STRUCT(7 AS fga) AS o, STRUCT(8 AS fga) AS d) AS t1,
      STRUCT(STRUCT(3 AS fga) AS o, STRUCT(9 AS fga) AS d) AS t2,
      STRUCT(3 AS pts, 9 AS ast, 5 AS reb) AS t3, 7 tov, 3 blk UNION ALL 
    SELECT 'nick' name, 'jets' team,
      STRUCT(STRUCT(12 AS fga) AS o, STRUCT(13 AS fga) AS d) AS t1,
      STRUCT(STRUCT(15 AS fga) AS o, STRUCT(22 AS fga) AS d) AS t2,
      STRUCT(13 AS pts, 5 AS ast, 15 AS reb) AS t3, 75 tov, 23 blk
), flat_stats AS (
  SELECT name, team,
    t1.o.fga AS t1_o_fga,
    t1.d.fga AS t1_d_fga,
    t2.o.fga AS t2_o_fga,
    t2.d.fga AS t2_d_fga,
    t3.pts AS t3_pts,
    t3.ast AS t3_ast,
    t3.reb AS t3_reb,
    tov, blk
  FROM wide_stats
)
SELECT name, team, metric, SAFE_CAST(value AS FLOAT64) value 
FROM (
  SELECT name, team, 
    REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric, 
    REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
  FROM flat_stats, 
  UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(flat_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')   

with output

Row name    team    metric      value    
1   joe     bills   t1_o_fga    7.0  
2   joe     bills   t1_d_fga    8.0  
3   joe     bills   t2_o_fga    3.0  
4   joe     bills   t2_d_fga    9.0  
5   joe     bills   t3_pts      3.0  
6   joe     bills   t3_ast      9.0  
7   joe     bills   t3_reb      5.0  
8   joe     bills   tov         7.0  
9   joe     bills   blk         3.0  
10  nick    jets    t1_o_fga    12.0     
11  nick    jets    t1_d_fga    13.0     
12  nick    jets    t2_o_fga    15.0     
13  nick    jets    t2_d_fga    22.0     
14  nick    jets    t3_pts      13.0     
15  nick    jets    t3_ast      5.0  
16  nick    jets    t3_reb      15.0     
17  nick    jets    tov         75.0     
18  nick    jets    blk         23.0      

If for some reason you have problem with assembling flat_stats temp table manually - you can do a small trick like below

Step 1 - Just run below query in legacy mode with destination table [project:dataset.flat_stats]

#legacySQL
SELECT *
FROM [project:dataset.wide_stats]    

"Surprisingly", this will create table [project:dataset.flat_stats] with below structure

Row name    team    t1_o_fga    t1_d_fga    t2_o_fga    t2_d_fga    t3_pts  t3_ast  t3_reb  tov blk  
1   joe     bills   7           8           3           9           3       9       5       7   3    
2   nick    jets    12          13          15          22          13      5       15      75  23     

Step 2 - After that you can simply run below (now in Standard SQL)

#standardSQL
SELECT name, team, metric, SAFE_CAST(value AS FLOAT64) value 
FROM (
  SELECT name, team, 
    REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric, 
    REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
  FROM `project.dataset.flat_stats` flat_stats, 
  UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(flat_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')  


来源:https://stackoverflow.com/questions/58776635/bigquery-reshape-table-with-structs-from-wide-to-long

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!