postgres: get top n occurrences of a value within each group

随声附和 提交于 2019-12-23 19:46:48

问题


I have a simple table like this:

user    letter
--------------
1       A
1       A
1       B
1       B
1       B
1       C

2       A
2       B
2       B
2       C
2       C
2       C

I want to get the top 2 occurrences of 'letter' per user, like so

user    letter  rank(within user group)
--------------------
1       B       1
1       A       2

2       C       1
2       B       2

or even better: collapsed into columns

user    1st-most-occurrence  2nd-most-occurrence
1       B                   A
2       C                   B

How can I accomplish this in postgres?


回答1:


with cte as (
    select 
        t.user_id, t.letter,
        row_number() over(partition by t.user_id order by count(*) desc) as row_num
    from Table1 as t
    group by t.user_id, t.letter
)
select
    c.user_id,
    max(case when c.row_num = 1 then c.letter end) as "1st-most-occurance",
    max(case when c.row_num = 2 then c.letter end) as "2st-most-occurance"
from cte as c
where c.row_num <= 2
group by c.user_id

=> sql fiddle demo




回答2:


Something like this:

select *
from (
    select userid, 
           letter, 
           dense_rank() over (partition by userid order by count(*) desc) as rnk
    from letters
    group by userid, letter
) t
where rnk <= 2
order by userid, rnk;

Note that I replaced user with userid because using reserved words for columns is a bad habit.

Here is an SQLFiddle: http://sqlfiddle.com/#!12/ec3ec/1




回答3:


Needed function:

CREATE OR REPLACE FUNCTION sortCountLimitOffset(anyarray, int, int)
  RETURNS anyarray AS 'select array_agg(x) from (select x from (select unnest($1) as x) as t group by x order by count(*) desc offset $2 limit $3) t;'
  LANGUAGE sql VOLATILE
  COST 100;

Solution 1: (returning all letters concatenated as string)

select
    usr,
    array_to_string(sortCountLimitOffset(array_agg(letter), 0, 5), ',')
from ttt
group by usr;

Output:

 usr | array_to_string
-----+-----------------
   1 | B,A,C
   2 | C,B,A
(2 Zeilen)

Solution 2: (returning each n-th letter in separate column)

select
    usr,
    array_to_string(sortCountLimitOffset(array_agg(letter), 0, 1), ',') letter1,
    array_to_string(sortCountLimitOffset(array_agg(letter), 1, 1), ',') letter2,
    array_to_string(sortCountLimitOffset(array_agg(letter), 2, 1), ',') letter3,
    array_to_string(sortCountLimitOffset(array_agg(letter), 3, 1), ',') letter4,
    array_to_string(sortCountLimitOffset(array_agg(letter), 4, 1), ',') letter5
from ttt
group by usr;

Output:

 usr | letter1 | letter2 | letter3 | letter4 | letter5
-----+---------+---------+---------+---------+---------
   1 | B       | A       | C       |         |
   2 | C       | B       | A       |         |
(2 Zeilen)

It would also be possible to inline the SELECT from the function where the function is called. But the way it is now, it is easier to reuse and maintain the code.



来源:https://stackoverflow.com/questions/18632599/postgres-get-top-n-occurrences-of-a-value-within-each-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!