Restructure table and check for values

问题

I have one table (Table 1) which looks like below-

keys
AAB12B34
CC34DE5W
SEF5C6T4
SQA7ZZ87
LM24NO3P
X34YY78Z

And another table (Table 2) which looks like below-

category_id   category_name    associated_keys
    111          Books         CC34DE5W|SQA7ZZ87|LM24NO3P
    222          Office        LM24NO3P|AAB12B34
    444         Furniture      X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34
    222          Office        X34YY78Z

I want to do 2 tasks-

Task 1:

At any given point I want to have only one row for each category_id. If there are 2 rows (meaning if the id is repeated) then I want to group by category_id and add the associated_keys. So, out put of this table should look like-

Table 3-

category_id   category_name    associated_keys
    111          Books         CC34DE5W|SQA7ZZ87|LM24NO3P
    222          Office        LM24NO3P|AAB12B34|X34YY78Z
    444         Furniture      X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34

Task 2:

Then, I want to convert all values inside the keys field in Table 1 into column names and then check Table 2. If a particular category_id has that key in associated_keys field then I want to add 1 underneath that key field else insert 0.

Finally, the result will look like-

category_id  category_name AAB12B34  CC34DE5W   SEF5C6T4    SQA7ZZ87   LM24NO3P   X34YY78Z
    111         Books         0          1         0            1          1          0
    222        Office         1          0         0            0          1          1
    444       Furniture       1          1         1            1          1          1

回答1:

Below is for BigQuery Standard SQL and accumulate all new nuances about your case - hope you will be able to make this new cleaned version to work for you (note: it is actually the same logic as in my initial answer, but with some correction to address column names starting with digits, using correct column name in Table1, etc)

DECLARE statement STRING;
SET statement = (
  WITH task1 AS (
    SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
    FROM `your_project.your_dataset.data` 
    GROUP BY category_id, category_name
  )
  SELECT '''
  SELECT 
    category_id,
    category_name, ''' || (
    SELECT STRING_AGG('IF("' || ids || '" IN UNNEST(keys), 1, 0) AS col_' || ids, ', ') 
    FROM `your_project.your_dataset.keys` 
    ) || ''' FROM task1, UNNEST([STRUCT(SPLIT(associated_keys, "|") AS keys)])'''
);
EXECUTE IMMEDIATE '''
WITH task1 AS (
  SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
  FROM `your_project.your_dataset.data` 
  GROUP BY category_id, category_name
)''' || statement;

Above assumes below tables structure (and sample data)

`your_project.your_dataset.keys` AS (
    SELECT 'AAB12B34' ids UNION ALL
    SELECT '34DE5WCC' UNION ALL
    SELECT 'SEF5C6T4' UNION ALL
    SELECT 'SQA7ZZ87' UNION ALL
    SELECT '24NO3PLM' UNION ALL
    SELECT 'X34YY78Z' 
  ), `your_project.your_dataset.data` AS (
    SELECT 111 category_id, 'Books' category_name, '34DE5WCC|SQA7ZZ87|24NO3PLM|SQA7ZZ87|sample300|sample300' associated_keys UNION ALL
    SELECT 222, 'Office', '24NO3PLM|AAB12B34|X34YY78Z' UNION ALL
    SELECT 444, 'Furniture', 'X34YY78Z|24NO3PLM|SQA7ZZ87|SEF5C6T4|34DE5WCC|AAB12B34|sample200' UNION ALL
    SELECT 222, 'Office', 'X34YY78Z|sample100' UNION ALL
    SELECT 111, 'Books', 'AAB12B34' 
  )

if to apply code to above sample data

result is

Row category_id category_name   col_AAB12B34    col_34DE5WCC    col_SEF5C6T4    col_SQA7ZZ87    col_24NO3PLM    col_X34YY78Z     
1   111         Books           1               1               0               1               1               0    
2   222         Office          1               0               0               0               1               1    
3   444         Furniture       1               1               1               1               1               1

回答2:

Below is for BigQuery Standard SQL

Task 1

#standardSQL
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `project.dataset.data` 
GROUP BY category_id, category_name

if to apply to sample data from your question as in below example -

#standardSQL
WITH `project.dataset.data` AS (
  SELECT 111 category_id, 'Books' category_name, 'CC34DE5W|SQA7ZZ87|LM24NO3P' associated_keys UNION ALL
  SELECT 222, 'Office', 'LM24NO3P|AAB12B34' UNION ALL
  SELECT 444, 'Furniture', 'X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34' UNION ALL
  SELECT 222, 'Office', 'X34YY78Z' 
)
SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM `project.dataset.data` 
GROUP BY category_id, category_name

output is

Row category_id category_name   associated_keys  
1   111         Books           CC34DE5W|SQA7ZZ87|LM24NO3P   
2   222         Office          LM24NO3P|AAB12B34|X34YY78Z   
3   444         Furniture       X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34

Task 2 - using scripting feature of BQ and dynamically builds needed pivot statement and does not depend on number and names of keys thus eliminates manual building of select statement

DECLARE statement STRING;
SET statement = (
  WITH task1 AS (
    SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
    FROM `project.dataset.data` 
    GROUP BY category_id, category_name
  )
  SELECT '''
  SELECT 
    category_id,
    category_name, ''' || (
    SELECT STRING_AGG('IF("' || key || '" IN UNNEST(keys), 1, 0) AS ' || key, ', ') 
    FROM `project.dataset.keys` 
    ) || ''' FROM task1, UNNEST([STRUCT(SPLIT(associated_keys, "|") AS keys)])'''
);
EXECUTE IMMEDIATE '''
WITH task1 AS (
  SELECT category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
  FROM `project.dataset.data` 
  GROUP BY category_id, category_name
)''' || statement;

when applied to sample data from your question - output is

Row category_id category_name   AAB12B34    CC34DE5W    SEF5C6T4    SQA7ZZ87    LM24NO3P    X34YY78Z     
1   111         Books           0           1           0           1           1           0    
2   222         Office          1           0           0           0           1           1    
3   444         Furniture       1           1           1           1           1           1

回答3:

You can try this:

First Answer:

SELECT
  category_id, category_name, STRING_AGG(associated_keys, "|") AS associated_keys
FROM
  dataset.table2
GROUP BY 
  category_id, category_name;

Second Answer:

Standard SQL Option

select category_id,
       category_name,
       keys,
       IF(keys like '%AAB12B34%', 1,0) as AAB12B34,
       IF(keys like '%CC34DE5W%', 1,0) as CC34DE5W,
       IF(keys like '%SEF5C6T4%', 1,0) as SEF5C6T4,
       IF(keys like '%SQA7ZZ87%', 1,0) as SQA7ZZ87,
       IF(keys like '%LM24NO3P%', 1,0) as LM24NO3P,
       IF(keys like '%X34YY78Z%', 1,0) as X34YY78Z 
from (
select category_id, category_name, STRING_AGG(associated_keys, "|") keys from table2 group by category_id, category_name);

Dynamic SQL Option

Create a View:

CREATE OR REPLACE VIEW
  dataset.vw_table2 AS
SELECT
  category_id, category_name, STRING_AGG(associated_keys, '|') AS associated_keys
FROM
  dataset.table2
GROUP BY category_id, category_name;

Create Dynamic SQL:

DECLARE
  dynamicsql STRING;
SET
  dynamicsql = (
  SELECT
    'select category_id, category_name, ' || STRING_AGG('IF(associated_keys LIKE "%' || keys || '%", 1, 0) AS ' || keys, ',') || ' from dataset.vw_table2'
  FROM
    dataset.table1);
EXECUTE IMMEDIATE
  dynamicsql;

Snapshot:

来源：https://stackoverflow.com/questions/63642654/restructure-table-and-check-for-values

标签

sql

google-bigquery