问题
I have a problem regarding SQL query , it can be done in "plain" SQL, but as I am sure that I need to use some group concatenation (can't use MySQL) so second option is ORACLE dialect as there will be Oracle database. Let's say we have following entities:
Table: Veterinarian visits
Visit_Id,
Animal_id,
Veterinarian_id,
Sickness_code
Let's say there is 100 visits (100 visit_id) and each animal_id visits around 20 times.
I need to create a SELECT
, grouped by Animal_id with 3 columns
- animal_id
- second shows aggregated amount of flu visits for this particular animal (let's say flu, sickness_code = 5)
- 3rd column shows top three sicknesses codes for each animal (top 3 most often codes for this particular animal_id)
How to do it? First and second columns are easy, but third? I know that I need to use LISTAGG from Oracle, OVER PARTITION BY, COUNT and RANK, I tried to tie it together but didn't work out as I expected :( How should this query look like?
回答1:
Here sample data
create table VET as
select
rownum+1 Visit_Id,
mod(rownum+1,5) Animal_id,
cast(NULL as number) Veterinarian_id,
trunc(10*dbms_random.value)+1 Sickness_code
from dual
connect by level <=100;
Query
basically the subqueries do the following:
aggregate count and calculate flu count (in all records of the animal)
calculate RANK (if you need realy only 3 records use ROW_NUMBER - see discussion below)
Filter top 3 RANKs
LISTAGGregate result
with agg as (
select Animal_id, Sickness_code, count(*) cnt,
sum(case when SICKNESS_CODE = 5 then 1 else 0 end) over (partition by animal_id) as cnt_flu
from vet
group by Animal_id, Sickness_code
), agg2 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, cnt_flu,
rank() OVER (PARTITION BY ANIMAL_ID ORDER BY cnt DESC) rnk
from agg
), agg3 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, CNT_FLU, RNK
from agg2
where rnk <= 3
)
select
ANIMAL_ID, max(CNT_FLU) CNT_FLU,
LISTAGG(SICKNESS_CODE||'('||CNT||')', ', ') WITHIN GROUP (ORDER BY rnk) as cnt_lts
from agg3
group by ANIMAL_ID
order by 1;
gives
ANIMAL_ID CNT_FLU CNT_LTS
---------- ---------- ---------------------------------------------
0 1 6(5), 1(4), 9(3)
1 1 1(5), 3(4), 2(3), 8(3)
2 0 1(5), 10(3), 4(3), 6(3), 7(3)
3 1 5(4), 2(3), 4(3), 7(3)
4 1 2(5), 10(4), 1(2), 3(2), 5(2), 7(2), 8(2)
I intentionally show Sickness_code(count visits) to demonstarte that top 3 can have ties that you should handle.
Check the RANK function. Using ROW_NUMBER
is not deterministic in this case.
回答2:
I think the most natural way uses two levels of aggregation, along with a dash of window functions here and there:
select vas.animal,
sum(case when sickness_code = 5 then cnt else 0 end) as numflu,
listagg(case when seqnum <= 3 then sickness_code end, ',') within group (order by seqnum) as top3sicknesses
from (select animal, sickness_code, count(*) as cnt,
row_number() over (partition by animal order by count(*) desc) as seqnum
from visits
group by animal, sickness_code
) vas
group by vas.animal;
This uses the fact that listagg()
ignores NULL
values.
来源:https://stackoverflow.com/questions/38127914/oracle-listagg-top-3-most-frequent-values-given-in-one-column-grouped-by-id