How to unnest and pivot two columns in BigQuery

社会主义新天地 提交于 2021-02-05 12:01:32

问题


Say I have a BQ table containing the following information

| id    | test.name     | test.score    |
|----   |-----------    |------------   |
| 1     | a             | 5             |
|       | b             | 7             |
| 2     | a             | 8             |
|       | c             | 3             |

Where test is nested. How would I pivot test into the following table?

| id    | a     | b     | c     |
|----   |---    |---    |---    |
| 1     | 5     | 7     |       |
| 2     | 8     |       | 3     |

I cannot pivot test directly, as I get the following error message at pivot(test): Table-valued function not found. Previous questions (1, 2) don't deal with nested columns or are outdated.

The following query looks like a useful first step:

select a.id, t
from `table` as a,
unnest(test) as t

However, this just provides me with:

| id    | test.name     | test.score    |
|----   |-----------    |------------   |
| 1     | a             | 5             |
| 1     | b             | 7             |
| 2     | a             | 8             |
| 2     | c             | 3             |

回答1:


One option could be using conditional aggregation

select id, 
       max(case when test.name='a' then test.score end) as a,
       max(case when test.name='b' then test.score end) as b,
       max(case when test.name='c' then test.score end) as c
from 
(
select a.id, t
from `table` as a,
unnest(test) as t
)A group by id



回答2:


Conditional aggregation is a good approach. If your tables are large, you might find that this has the best performance:

select t.id,
       (select max(tt.score) from unnest(t.score) tt where tt.name = 'a') as a,
       (select max(tt.score) from unnest(t.score) tt where tt.name = 'b') as b,
       (select max(tt.score) from unnest(t.score) tt where tt.name = 'c') as c
from `table` t;

The reason I recommend this is because it avoids the outer aggregation. The unnest() happens without shuffling the data around -- and I have found that this is a big win in terms of performance.




回答3:


Below is generic/dynamic way to handle your case

EXECUTE IMMEDIATE (
  SELECT """
  SELECT id, """ || 
    STRING_AGG("""MAX(IF(name = '""" || name || """', score, NULL)) AS """ || name, ', ') 
  || """
  FROM `project.dataset.table` t, t.test
  GROUP BY id
  """
  FROM (
    SELECT DISTINCT name
    FROM `project.dataset.table` t, t.test
    ORDER BY name
  )
);  

If to apply to sample data from your question - output is

Row     id      a       b       c    
1       1       5       7       null     
2       2       8       null    3    


来源:https://stackoverflow.com/questions/63989161/how-to-unnest-and-pivot-two-columns-in-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!