Join uneven arrays from many columns and avoid duplicates in BigQuery

六月ゝ 毕业季﹏ 提交于 2020-01-17 00:36:01

问题


I asked a similar question here that I thought abstracted my problem sufficiently but unfortunately, it did not.

I have a table of nested arrays, the first column is an int. I can join two arrays without duplication (as answered in my previous question) but I'm unsure how to do it with more than two.

Here is the table (in StandardSQL):

WITH
  a AS (
  SELECT 
    1 AS col1,
    ARRAY[1, 2 ] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
  UNION ALL
  SELECT
    2 AS col1, 
    ARRAY[1, 2, 2] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
  UNION ALL
  SELECT
    3 AS col1,
    ARRAY[2, 2 ] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
    )
SELECT
  *
FROM
  a

Produces:

+-------++--------++--------++---------+
| col1   |   col2  |   col3  |   col4  |
+-------++--------++--------++---------+
|   1    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
|   2    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
|   3    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
+-------++--------++--------++---------+

But what I'm looking for is this:

+-------++--------++--------++---------+
| col1   |   col2  |   col3  |   col4  |
+-------++--------++--------++---------+
|   1    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
|   2    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
|   3    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
+-------++--------++--------++---------+

Here is how I'm unnesting the many columns:

SELECT
  col1,
  _col2,
  _col3
FROM
  a left join 
  unnest(col2) as _col2 
  left join unnest(col3) as _col3

Producing this table:

+-------++--------++--------+
| col1   |   col2  |   col3 |
+-------++--------++--------+
|   1    |   1     |   1    |
|   1    |   1     |   2    |
|   1    |   1     |   3    |
|   1    |   2     |   1    |
|   1    |   2     |   2    |
|   1    |   2     |   3    |
|   2    |   1     |   1    |
|   2    |   1     |   2    |
|   2    |   1     |   3    |
|   2    |   2     |   1    |
|   2    |   2     |   2    |
|   2    |   2     |   3    |
...
...
...
+-------++--------++--------++

回答1:


I don't fully understand how your results relate to the input data. The results for all the col1 values are exactly the same, but the inputs are different.

That said, I can interpret this as an extension of your previous question. This may be what you want:

SELECT a.col1, c2, c3, c4
FROM (select a.*,
             (SELECT ARRAY_AGG(DISTINCT c) cs
              from unnest(array_concat( col2, col3, col4)) c
             ) cs
      from a 
     ) a cross join
     unnest(cs) c left join      
     unnest(a.col2) c2
     on c2 = c left join
     unnest(a.col3) c3
     on c3 = c left join
     unnest(a.col4) c4
     on c4 = c;

The initial subquery for a generates all the values in the arrays. This is then used for a left join.



来源:https://stackoverflow.com/questions/56586399/join-uneven-arrays-from-many-columns-and-avoid-duplicates-in-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!