postgres: get top n occurrences of a value within each group

问题

I have a simple table like this:

user    letter
--------------
1       A
1       A
1       B
1       B
1       B
1       C

2       A
2       B
2       B
2       C
2       C
2       C

I want to get the top 2 occurrences of 'letter' per user, like so

user    letter  rank(within user group)
--------------------
1       B       1
1       A       2

2       C       1
2       B       2

or even better: collapsed into columns

user    1st-most-occurrence  2nd-most-occurrence
1       B                   A
2       C                   B

How can I accomplish this in postgres?

回答1:

with cte as (
    select 
        t.user_id, t.letter,
        row_number() over(partition by t.user_id order by count(*) desc) as row_num
    from Table1 as t
    group by t.user_id, t.letter
)
select
    c.user_id,
    max(case when c.row_num = 1 then c.letter end) as "1st-most-occurance",
    max(case when c.row_num = 2 then c.letter end) as "2st-most-occurance"
from cte as c
where c.row_num <= 2
group by c.user_id

=> sql fiddle demo

回答2:

Something like this:

select *
from (
    select userid, 
           letter, 
           dense_rank() over (partition by userid order by count(*) desc) as rnk
    from letters
    group by userid, letter
) t
where rnk <= 2
order by userid, rnk;

Note that I replaced user with userid because using reserved words for columns is a bad habit.

Here is an SQLFiddle: http://sqlfiddle.com/#!12/ec3ec/1

回答3:

Needed function:

CREATE OR REPLACE FUNCTION sortCountLimitOffset(anyarray, int, int)
  RETURNS anyarray AS 'select array_agg(x) from (select x from (select unnest($1) as x) as t group by x order by count(*) desc offset $2 limit $3) t;'
  LANGUAGE sql VOLATILE
  COST 100;

Solution 1: (returning all letters concatenated as string)

select
    usr,
    array_to_string(sortCountLimitOffset(array_agg(letter), 0, 5), ',')
from ttt
group by usr;

Output:

 usr | array_to_string
-----+-----------------
   1 | B,A,C
   2 | C,B,A
(2 Zeilen)

Solution 2: (returning each n-th letter in separate column)

select
    usr,
    array_to_string(sortCountLimitOffset(array_agg(letter), 0, 1), ',') letter1,
    array_to_string(sortCountLimitOffset(array_agg(letter), 1, 1), ',') letter2,
    array_to_string(sortCountLimitOffset(array_agg(letter), 2, 1), ',') letter3,
    array_to_string(sortCountLimitOffset(array_agg(letter), 3, 1), ',') letter4,
    array_to_string(sortCountLimitOffset(array_agg(letter), 4, 1), ',') letter5
from ttt
group by usr;

Output:

 usr | letter1 | letter2 | letter3 | letter4 | letter5
-----+---------+---------+---------+---------+---------
   1 | B       | A       | C       |         |
   2 | C       | B       | A       |         |
(2 Zeilen)

It would also be possible to inline the SELECT from the function where the function is called. But the way it is now, it is easier to reuse and maintain the code.

来源：https://stackoverflow.com/questions/18632599/postgres-get-top-n-occurrences-of-a-value-within-each-group

标签

sql

postgresql

group-by

sql-order-by

greatest-n-per-group