Oracle ListaGG, Top 3 most frequent values, given in one column, grouped by ID

喜你入骨 提交于 2019-12-06 03:37:43

Here sample data

create table VET as
select 
rownum+1 Visit_Id, 
mod(rownum+1,5) Animal_id, 
cast(NULL as number)  Veterinarian_id, 
trunc(10*dbms_random.value)+1 Sickness_code
from dual
connect by level <=100;

Query

basically the subqueries do the following:

aggregate count and calculate flu count (in all records of the animal)

calculate RANK (if you need realy only 3 records use ROW_NUMBER - see discussion below)

Filter top 3 RANKs

LISTAGGregate result

with agg as (
select Animal_id, Sickness_code, count(*) cnt,
sum(case when SICKNESS_CODE = 5 then 1 else 0 end) over (partition by animal_id) as cnt_flu
from vet
group by Animal_id, Sickness_code
), agg2 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, cnt_flu,
rank() OVER (PARTITION BY ANIMAL_ID ORDER BY cnt DESC) rnk
from agg
), agg3 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, CNT_FLU, RNK
from agg2
where rnk <= 3
)
select 
ANIMAL_ID, max(CNT_FLU) CNT_FLU,
LISTAGG(SICKNESS_CODE||'('||CNT||')', ', ') WITHIN GROUP (ORDER BY rnk)  as   cnt_lts
from agg3
group by ANIMAL_ID 
order by 1;

gives

 ANIMAL_ID    CNT_FLU CNT_LTS                                     
---------- ---------- ---------------------------------------------
         0          1 6(5), 1(4), 9(3)                              
         1          1 1(5), 3(4), 2(3), 8(3)                        
         2          0 1(5), 10(3), 4(3), 6(3), 7(3)                 
         3          1 5(4), 2(3), 4(3), 7(3)                        
         4          1 2(5), 10(4), 1(2), 3(2), 5(2), 7(2), 8(2) 

I intentionally show Sickness_code(count visits) to demonstarte that top 3 can have ties that you should handle. Check the RANK function. Using ROW_NUMBER is not deterministic in this case.

I think the most natural way uses two levels of aggregation, along with a dash of window functions here and there:

select vas.animal,
       sum(case when sickness_code = 5 then cnt else 0 end) as numflu,
       listagg(case when seqnum <= 3 then sickness_code end, ',') within group (order by seqnum) as top3sicknesses
from (select animal, sickness_code, count(*) as cnt,
             row_number() over (partition by animal order by count(*) desc) as seqnum
      from visits
      group by animal, sickness_code
     ) vas
group by vas.animal;

This uses the fact that listagg() ignores NULL values.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!