How to get substring for filter and group by clause in AWS Redshift database

问题

How to get substring from column which contains records for filter and group by clause in AWS Redshift database.

I have table with records like:

Table_Id | Categories         | Value
<ID>     | ABC1; ABC1-1; XYZ  | 10
<ID>     | ABC1; ABC1-2; XYZ  | 15
<ID>     | XYZ                | 5
.....

Now I want to filter records based on individual category like 'ABC1' or 'ABC1 and XYZ'

Expected output from query would like:

Table_Id | Categories         | Value
<ID>     | ABC1               | 25
<ID>     | ABC1-1             | 10
<ID>     | ABC1-2             | 15
<ID>     | XYZ                | 30
.....

So need to group results based on individual categories.

回答1:

If you have at most 3 values in any "categories" cell you can unnest the cells, get the list of unique values and use that list in a join condition like this:

WITH
values as (
    select distinct category
    from (
            select distinct split_part(categories,';',1) as category from your_table
            union select distinct split_part(categories,';',2) from your_table
            union select distinct split_part(categories,';',3) from your_table
     )
     where nullif(category,'') is not null
)
SELECT
 t2.category
,sum(t1.value)
FROM your_table t1
JOIN values t2
ON split_part(categories,';',1)=t2.category
OR split_part(categories,';',2)=t2.category
OR split_part(categories,';',3)=t2.category

if you have more than 3 options just add another split_part level both in WITH part and the join condition

回答2:

@JonScott, @AlexYes and other pals who struggle with similar kinda situations.

I found more better approach other than suggested by @AlexYes.

What I did, I flatter category column which result individual records. Which I can further process.

Query:

select row_number() over(order by 1) as r1, 
        to_char(timestamptz 'epoch' + date_time * interval '1 second', 'yyyy-mm-dd') AS DAY,
        split_part(categories, ';', numbers.n) as catg,
        value
    from <TABLE>
    join numbers
    on numbers.n <= regexp_count(category_string, ';') + 1 <OTHER_CONDITIONS>

Explanation:

Two functions are useful here: first, the split_part function, which takes a string, splits it on ';' delimiter, and returns the first, second, ... , nth value specified from the split string; second, regexp_count, which tells us how many times a particular pattern is found in our string.

回答3:

To do this fully dynamically, you need to transpose or pivot values in "categories" column into separate rows. Unfortunately, a "fully dynamic" solution (without knowing the different values beforehand) is NOT possible using redshift.

Your options are as follows:

Use the method suggested by AlexYes in another answer. This is semi-dynamic and is probably your best option.
Outside of Redshift, run some ETL code to perform the column -> multiple rows ETL.
Create a hardcoded type solution, and perform the pivot something like this:

select table_id,'ABC1' as category, case when concat(Categories,';') ilike '%ABC1;%' then value else 0 end as value from your_table union all select table_id,'ABC1-1' as category, case when concat(Categories,';')ilike '%ABC1-1;%' then value else 0 end as value from your_table union all

etc

来源：https://stackoverflow.com/questions/49994931/how-to-get-substring-for-filter-and-group-by-clause-in-aws-redshift-database

标签

sql

split

amazon-redshift