SQL unique combinations

跟風遠走 提交于 2020-02-04 04:09:11

问题


I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.

ID     therapeutic_class       generic_name
1           YG4                    insulin
1           CJ6                    maleate
1           MG9                    glargine
2           C4C                    diaoxy
2           KR3                    supplies
3           YG4                    insuilin
3           CJ6                    maleate
3           MG9                    glargine

I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:

Count          Combination_generic                   combination_therapeutic
2              insulin, maleate, glargine                 YG4, CJ6, MG9
1              supplies, diaoxy                           C4C, KR3

回答1:


One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.

I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.

Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.

with
         -- Begin simulated data (not part of the solution)
         test_data ( id, therapeutic_class, generic_name ) as (
           select 1, 'GY6', 'insulin'  from dual union all
           select 1, 'MH4', 'maleate'  from dual union all
           select 1, 'KJ*', 'glargine' from dual union all
           select 2, 'GY6', 'supplies' from dual union all
           select 2, 'C4C', 'diaoxy'   from dual union all
           select 3, 'GY6', 'insulin'  from dual union all
           select 3, 'MH4', 'maleate'  from dual union all
           select 3, 'KJ*', 'glargine' from dual
         ),
         -- End of simulated data (for testing purposes only).
         -- SQL query solution continues BELOW THIS LINE
     valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
       select rownum, therapeutic_class, generic_name
       from   (
                select distinct therapeutic_class, generic_name
                from   test_data
              )
     ),
     first_agg ( id, tc_list, gn_list ) as (
       select t.id, 
              listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
              listagg(p.generic_name     , ',') within group (order by p.pair_id)
       from   test_data t join valid_pairs p
                           on t.therapeutic_class = p.therapeutic_class
                          and t.generic_name      = p.generic_name
       group by t.id
     )
select   count(*) as cnt, tc_list, gn_list
from     first_agg
group by tc_list, gn_list
;

Output:

CNT TC_LIST            GN_LIST                      
--- ------------------ ------------------------------
  1 GY6,C4C            supplies,diaoxy               
  2 GY6,KJ*,MH4        insulin,glargine,maleate     



回答2:


You are looking for listagg() and then another aggregation. I think:

select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
             listagg(generic_name, ', ') within group (order by generic_name) as generics
      from t
      group by id
     ) t
group by therapeutics, generics;


来源:https://stackoverflow.com/questions/44811583/sql-unique-combinations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!