问题
I have one table (Table 1) which looks like below-
keys
AAB12B34
CC34DE5W
SEF5C6T4
SQA7ZZ87
LM24NO3P
X34YY78Z
And another table (Table 2) which looks like below-
category_id category_name associated_keys
111 Books CC34DE5W|SQA7ZZ87|LM24NO3P
222 Office LM24NO3P|AAB12B34
444 Furniture X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34
222 Office X34YY78Z
I want to do 2 tasks-
Task 1:
At any given point I want to have only one row for each category_id. If there are 2 rows (meaning if the id is repeated) then I want to group by category_id and add the associated_keys. So, out put of this table should look like-
Table 3-
category_id category_name associated_keys
111 Books CC34DE5W|SQA7ZZ87|LM24NO3P
222 Office LM24NO3P|AAB12B34|X34YY78Z
444 Furniture X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34
Task 2:
Then, I want to convert all values inside the keys field in Table 1 into column names and then check Table 2. If a particular category_id has that key in associated_keys field then I want to add 1 underneath that key field else insert 0.
Finally, the result will look like-
category_id category_name AAB12B34 CC34DE5W SEF5C6T4 SQA7ZZ87 LM24NO3P X34YY78Z
111 Books 0 1 0 1 1 0
222 Office 1 0 0 0 1 1
444 Furniture 1 1 1 1 1 1
回答1:
Below is for BigQuery Standard SQL and accumulate all new nuances about your case - hope you will be able to make this new cleaned version to work for you (note: it is actually the same logic as in my initial answer, but with some correction to address column names starting with digits, using correct column name in Table1, etc)
DECLARE statement STRING;
SET statement = (
WITH task1 AS (
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `your_project.your_dataset.data`
GROUP BY category_id, category_name
)
SELECT '''
SELECT
category_id,
category_name, ''' || (
SELECT STRING_AGG('IF("' || ids || '" IN UNNEST(keys), 1, 0) AS col_' || ids, ', ')
FROM `your_project.your_dataset.keys`
) || ''' FROM task1, UNNEST([STRUCT(SPLIT(associated_keys, "|") AS keys)])'''
);
EXECUTE IMMEDIATE '''
WITH task1 AS (
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `your_project.your_dataset.data`
GROUP BY category_id, category_name
)''' || statement;
Above assumes below tables structure (and sample data)
`your_project.your_dataset.keys` AS (
SELECT 'AAB12B34' ids UNION ALL
SELECT '34DE5WCC' UNION ALL
SELECT 'SEF5C6T4' UNION ALL
SELECT 'SQA7ZZ87' UNION ALL
SELECT '24NO3PLM' UNION ALL
SELECT 'X34YY78Z'
), `your_project.your_dataset.data` AS (
SELECT 111 category_id, 'Books' category_name, '34DE5WCC|SQA7ZZ87|24NO3PLM|SQA7ZZ87|sample300|sample300' associated_keys UNION ALL
SELECT 222, 'Office', '24NO3PLM|AAB12B34|X34YY78Z' UNION ALL
SELECT 444, 'Furniture', 'X34YY78Z|24NO3PLM|SQA7ZZ87|SEF5C6T4|34DE5WCC|AAB12B34|sample200' UNION ALL
SELECT 222, 'Office', 'X34YY78Z|sample100' UNION ALL
SELECT 111, 'Books', 'AAB12B34'
)
if to apply code to above sample data
result is
Row category_id category_name col_AAB12B34 col_34DE5WCC col_SEF5C6T4 col_SQA7ZZ87 col_24NO3PLM col_X34YY78Z
1 111 Books 1 1 0 1 1 0
2 222 Office 1 0 0 0 1 1
3 444 Furniture 1 1 1 1 1 1
回答2:
Below is for BigQuery Standard SQL
Task 1
#standardSQL
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `project.dataset.data`
GROUP BY category_id, category_name
if to apply to sample data from your question as in below example -
#standardSQL
WITH `project.dataset.data` AS (
SELECT 111 category_id, 'Books' category_name, 'CC34DE5W|SQA7ZZ87|LM24NO3P' associated_keys UNION ALL
SELECT 222, 'Office', 'LM24NO3P|AAB12B34' UNION ALL
SELECT 444, 'Furniture', 'X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34' UNION ALL
SELECT 222, 'Office', 'X34YY78Z'
)
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `project.dataset.data`
GROUP BY category_id, category_name
output is
Row category_id category_name associated_keys
1 111 Books CC34DE5W|SQA7ZZ87|LM24NO3P
2 222 Office LM24NO3P|AAB12B34|X34YY78Z
3 444 Furniture X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34
Task 2 - using scripting feature of BQ and dynamically builds needed pivot statement and does not depend on number and names of keys thus eliminates manual building of select statement
DECLARE statement STRING;
SET statement = (
WITH task1 AS (
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `project.dataset.data`
GROUP BY category_id, category_name
)
SELECT '''
SELECT
category_id,
category_name, ''' || (
SELECT STRING_AGG('IF("' || key || '" IN UNNEST(keys), 1, 0) AS ' || key, ', ')
FROM `project.dataset.keys`
) || ''' FROM task1, UNNEST([STRUCT(SPLIT(associated_keys, "|") AS keys)])'''
);
EXECUTE IMMEDIATE '''
WITH task1 AS (
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `project.dataset.data`
GROUP BY category_id, category_name
)''' || statement;
when applied to sample data from your question - output is
Row category_id category_name AAB12B34 CC34DE5W SEF5C6T4 SQA7ZZ87 LM24NO3P X34YY78Z
1 111 Books 0 1 0 1 1 0
2 222 Office 1 0 0 0 1 1
3 444 Furniture 1 1 1 1 1 1
回答3:
You can try this:
First Answer:
SELECT
category_id, category_name, STRING_AGG(associated_keys, "|") AS associated_keys
FROM
dataset.table2
GROUP BY
category_id, category_name;
Second Answer:
Standard SQL Option
select category_id,
category_name,
keys,
IF(keys like '%AAB12B34%', 1,0) as AAB12B34,
IF(keys like '%CC34DE5W%', 1,0) as CC34DE5W,
IF(keys like '%SEF5C6T4%', 1,0) as SEF5C6T4,
IF(keys like '%SQA7ZZ87%', 1,0) as SQA7ZZ87,
IF(keys like '%LM24NO3P%', 1,0) as LM24NO3P,
IF(keys like '%X34YY78Z%', 1,0) as X34YY78Z
from (
select category_id, category_name, STRING_AGG(associated_keys, "|") keys from table2 group by category_id, category_name);
Dynamic SQL Option
Create a View:
CREATE OR REPLACE VIEW
dataset.vw_table2 AS
SELECT
category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM
dataset.table2
GROUP BY category_id, category_name;
Create Dynamic SQL:
DECLARE
dynamicsql STRING;
SET
dynamicsql = (
SELECT
'select category_id, category_name, ' || STRING_AGG('IF(associated_keys LIKE "%' || keys || '%", 1, 0) AS ' || keys, ',') || ' from dataset.vw_table2'
FROM
dataset.table1);
EXECUTE IMMEDIATE
dynamicsql;
Snapshot:
来源:https://stackoverflow.com/questions/63642654/restructure-table-and-check-for-values