How to get substring for filter and group by clause in AWS Redshift database

a 夏天 提交于 2019-12-20 05:42:28

问题


How to get substring from column which contains records for filter and group by clause in AWS Redshift database.

I have table with records like:

Table_Id | Categories         | Value
<ID>     | ABC1; ABC1-1; XYZ  | 10
<ID>     | ABC1; ABC1-2; XYZ  | 15
<ID>     | XYZ                | 5
.....

Now I want to filter records based on individual category like 'ABC1' or 'ABC1 and XYZ'

Expected output from query would like:

Table_Id | Categories         | Value
<ID>     | ABC1               | 25
<ID>     | ABC1-1             | 10
<ID>     | ABC1-2             | 15
<ID>     | XYZ                | 30
.....

So need to group results based on individual categories.


回答1:


If you have at most 3 values in any "categories" cell you can unnest the cells, get the list of unique values and use that list in a join condition like this:

WITH
values as (
    select distinct category
    from (
            select distinct split_part(categories,';',1) as category from your_table
            union select distinct split_part(categories,';',2) from your_table
            union select distinct split_part(categories,';',3) from your_table
     )
     where nullif(category,'') is not null
)
SELECT
 t2.category
,sum(t1.value)
FROM your_table t1
JOIN values t2
ON split_part(categories,';',1)=t2.category
OR split_part(categories,';',2)=t2.category
OR split_part(categories,';',3)=t2.category

if you have more than 3 options just add another split_part level both in WITH part and the join condition




回答2:


@JonScott, @AlexYes and other pals who struggle with similar kinda situations.

I found more better approach other than suggested by @AlexYes.

What I did, I flatter category column which result individual records. Which I can further process.

Query:

select row_number() over(order by 1) as r1, 
        to_char(timestamptz 'epoch' + date_time * interval '1 second', 'yyyy-mm-dd') AS DAY,
        split_part(categories, ';', numbers.n) as catg,
        value
    from <TABLE>
    join numbers
    on numbers.n <= regexp_count(category_string, ';') + 1 <OTHER_CONDITIONS>

Explanation:

Two functions are useful here: first, the split_part function, which takes a string, splits it on ';' delimiter, and returns the first, second, ... , nth value specified from the split string; second, regexp_count, which tells us how many times a particular pattern is found in our string.




回答3:


To do this fully dynamically, you need to transpose or pivot values in "categories" column into separate rows. Unfortunately, a "fully dynamic" solution (without knowing the different values beforehand) is NOT possible using redshift.

Your options are as follows:

  1. Use the method suggested by AlexYes in another answer. This is semi-dynamic and is probably your best option.

  2. Outside of Redshift, run some ETL code to perform the column -> multiple rows ETL.

  3. Create a hardcoded type solution, and perform the pivot something like this:

    select table_id,'ABC1' as category, case when concat(Categories,';') ilike '%ABC1;%' then value else 0 end as value from your_table union all select table_id,'ABC1-1' as category, case when concat(Categories,';')ilike '%ABC1-1;%' then value else 0 end as value from your_table union all

etc



来源:https://stackoverflow.com/questions/49994931/how-to-get-substring-for-filter-and-group-by-clause-in-aws-redshift-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!