问题
I asked a similar question here that I thought abstracted my problem sufficiently but unfortunately, it did not.
I have a table of nested arrays, the first column is an int. I can join two arrays without duplication (as answered in my previous question) but I'm unsure how to do it with more than two.
Here is the table (in StandardSQL):
WITH
a AS (
SELECT
1 AS col1,
ARRAY[1, 2 ] AS col2,
ARRAY[1, 2, 3] AS col3,
ARRAY[1, 2, 3, 4] AS col4
UNION ALL
SELECT
2 AS col1,
ARRAY[1, 2, 2] AS col2,
ARRAY[1, 2, 3] AS col3,
ARRAY[1, 2, 3, 4] AS col4
UNION ALL
SELECT
3 AS col1,
ARRAY[2, 2 ] AS col2,
ARRAY[1, 2, 3] AS col3,
ARRAY[1, 2, 3, 4] AS col4
)
SELECT
*
FROM
a
Produces:
+-------++--------++--------++---------+
| col1 | col2 | col3 | col4 |
+-------++--------++--------++---------+
| 1 | 1 | 1 | 1 |
| | 2 | 2 | 2 |
| | | 3 | 3 |
| | | | 4 |
| 2 | 1 | 1 | 1 |
| | 2 | 2 | 2 |
| | | 3 | 3 |
| | | | 4 |
| 3 | 1 | 1 | 1 |
| | 2 | 2 | 2 |
| | | 3 | 3 |
| | | | 4 |
+-------++--------++--------++---------+
But what I'm looking for is this:
+-------++--------++--------++---------+
| col1 | col2 | col3 | col4 |
+-------++--------++--------++---------+
| 1 | 1 | 1 | 1 |
| null | 2 | 2 | 2 |
| null | null | 3 | 3 |
| null | null | null | 4 |
| 2 | 1 | 1 | 1 |
| null | 2 | 2 | 2 |
| null | null | 3 | 3 |
| null | null | null | 4 |
| 3 | 1 | 1 | 1 |
| null | 2 | 2 | 2 |
| null | null | 3 | 3 |
| null | null | null | 4 |
+-------++--------++--------++---------+
Here is how I'm unnesting the many columns:
SELECT
col1,
_col2,
_col3
FROM
a left join
unnest(col2) as _col2
left join unnest(col3) as _col3
Producing this table:
+-------++--------++--------+
| col1 | col2 | col3 |
+-------++--------++--------+
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 1 | 3 |
| 1 | 2 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 3 |
| 2 | 1 | 1 |
| 2 | 1 | 2 |
| 2 | 1 | 3 |
| 2 | 2 | 1 |
| 2 | 2 | 2 |
| 2 | 2 | 3 |
...
...
...
+-------++--------++--------++
回答1:
I don't fully understand how your results relate to the input data. The results for all the col1
values are exactly the same, but the inputs are different.
That said, I can interpret this as an extension of your previous question. This may be what you want:
SELECT a.col1, c2, c3, c4
FROM (select a.*,
(SELECT ARRAY_AGG(DISTINCT c) cs
from unnest(array_concat( col2, col3, col4)) c
) cs
from a
) a cross join
unnest(cs) c left join
unnest(a.col2) c2
on c2 = c left join
unnest(a.col3) c3
on c3 = c left join
unnest(a.col4) c4
on c4 = c;
The initial subquery for a
generates all the values in the arrays. This is then used for a left join
.
来源:https://stackoverflow.com/questions/56586399/join-uneven-arrays-from-many-columns-and-avoid-duplicates-in-bigquery