问题
Deleted the previous version of this post in lieu of this cleaned up posting with a reproducible example. I have a table of the following format:
WITH wide_stats AS (
(
SELECT
'joe' name, 'bills' team,
struct(struct(7 as fga) as o, struct(8 as fga) as d) as t1,
struct(struct(3 as fga) as o, struct(9 as fga) as d) as t2,
struct(3 as pts, 9 as ast, 5 as reb) as t3,
7 tov, 3 blk
) UNION ALL (
SELECT 'nick' name, 'jets' team,
struct(struct(12 as fga) as o, struct(13 as fga) as d) as t1,
struct(struct(15 as fga) as o, struct(22 as fga) as d) as t2,
struct(13 as pts, 5 as ast, 15 as reb) as t3,
75 tov, 23 blk
)
)
SELECT
name, team, metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT
name, team,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM wide_stats,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(wide_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')
and I am working to reshape the table into the following output:
name team metric value
joe bills t1_o_fga 7
joe bills t1_d_fga 8
joe bills t2_o_fga 3
joe bills t2_d_fga 9
joe bills t3_pts 3
joe bills t3_ast 9
joe bills t3_reb 5
joe bills tov 7
joe bills blk 3
nick jets t1_o_fga 12
nick jets t1_d_fga 13
nick jets t2_o_fga 15
nick jets t2_d_fga 22
nick jets t3_pts 13
nick jets t3_ast 5
nick jets t3_reb 15
nick jets tov 75
nick jets blk 23
The task is simple to explain - from wide to long, but with struct
and nested struct
s in the table. My regex effort from another stackoveflow post is splitting up the column names in the wrong way, and the current output doesn't match how it needs to be.
Order of rows doesn't matter. With the names, doesn't matter if its t1_o_fga or t1-o-fga or t1/o/fga, so long as there's some separator and its clear what the variable is. Any help or direction is super appreciated, thanks!
回答1:
Below is for BigQuery Standard SQL
#standardSQL
WITH wide_stats AS (
SELECT 'joe' name, 'bills' team,
STRUCT(STRUCT(7 AS fga) AS o, STRUCT(8 AS fga) AS d) AS t1,
STRUCT(STRUCT(3 AS fga) AS o, STRUCT(9 AS fga) AS d) AS t2,
STRUCT(3 AS pts, 9 AS ast, 5 AS reb) AS t3, 7 tov, 3 blk UNION ALL
SELECT 'nick' name, 'jets' team,
STRUCT(STRUCT(12 AS fga) AS o, STRUCT(13 AS fga) AS d) AS t1,
STRUCT(STRUCT(15 AS fga) AS o, STRUCT(22 AS fga) AS d) AS t2,
STRUCT(13 AS pts, 5 AS ast, 15 AS reb) AS t3, 75 tov, 23 blk
), flat_stats AS (
SELECT name, team,
t1.o.fga AS t1_o_fga,
t1.d.fga AS t1_d_fga,
t2.o.fga AS t2_o_fga,
t2.d.fga AS t2_d_fga,
t3.pts AS t3_pts,
t3.ast AS t3_ast,
t3.reb AS t3_reb,
tov, blk
FROM wide_stats
)
SELECT name, team, metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT name, team,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM flat_stats,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(flat_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')
with output
Row name team metric value
1 joe bills t1_o_fga 7.0
2 joe bills t1_d_fga 8.0
3 joe bills t2_o_fga 3.0
4 joe bills t2_d_fga 9.0
5 joe bills t3_pts 3.0
6 joe bills t3_ast 9.0
7 joe bills t3_reb 5.0
8 joe bills tov 7.0
9 joe bills blk 3.0
10 nick jets t1_o_fga 12.0
11 nick jets t1_d_fga 13.0
12 nick jets t2_o_fga 15.0
13 nick jets t2_d_fga 22.0
14 nick jets t3_pts 13.0
15 nick jets t3_ast 5.0
16 nick jets t3_reb 15.0
17 nick jets tov 75.0
18 nick jets blk 23.0
If for some reason you have problem with assembling flat_stats
temp table manually - you can do a small trick like below
Step 1 - Just run below query in legacy mode with destination table [project:dataset.flat_stats]
#legacySQL
SELECT *
FROM [project:dataset.wide_stats]
"Surprisingly", this will create table [project:dataset.flat_stats]
with below structure
Row name team t1_o_fga t1_d_fga t2_o_fga t2_d_fga t3_pts t3_ast t3_reb tov blk
1 joe bills 7 8 3 9 3 9 5 7 3
2 nick jets 12 13 15 22 13 5 15 75 23
Step 2 - After that you can simply run below (now in Standard SQL)
#standardSQL
SELECT name, team, metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT name, team,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.flat_stats` flat_stats,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(flat_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')
来源:https://stackoverflow.com/questions/58776635/bigquery-reshape-table-with-structs-from-wide-to-long