Postgresql : How do I select top n percent(%) entries from each group/category

最后都变了- 提交于 2019-12-21 17:25:36

问题


We are new to postgres, we have following query by which we can select top N records from each category.

 create table temp (
     gp char,
     val int
 );

 insert into temp values ('A',10);
 insert into temp values ('A',8);
 insert into temp values ('A',6);
 insert into temp values ('A',4);
 insert into temp values ('B',3);
 insert into temp values ('B',2);
 insert into temp values ('B',1);

 select a.gp,a.val
 from   temp a
 where  a.val in (
              select b.val
              from   temp b
              where  a.gp=b.gp
              order by b.val desc
             limit 2);

Output of above query is something like this

 gp   val
 ----------
 A    10
 A    8
 B    3
 B    2

But our requirement is different, we want to select top n% records from each category where n is not fixed, n is based of some percent of elements in each group.


回答1:


To retrieve the rows based on the percentage of the number of rows in each group you can use two window functions: one to count the rows and one to give them a unique number.

select gp,
       val
from (
  select gp, 
         val,
         count(*) over (partition by gp) as cnt,
         row_number() over (partition by gp order by val desc) as rn
  from temp
) t
where rn / cnt <= 0.75;

SQLFiddle example: http://sqlfiddle.com/#!15/94fdd/1


Btw: using char is almost always a bad idea because it is a fixed-length data type that is padded to the defined length. I hope you only did that for setting up the example and don't use it in your real table.




回答2:


Referencing the response from a_horse_with_no_name, you can achieve something similar using percent_rank()

SELECT
    gp,
    val,
    pct_rank
FROM (
    SELECT
        gp,
        val,
        percent_rank() over (order by val desc) as pct_rank
    FROM variables.temp
    ) t
WHERE pct_rank <= 0.75;

You can then set the final WHERE clause to return data at whatever percent_rank() threshold you require.




回答3:


The accepted answer did not work for me. I find this solution that works for me:

SELECT * FROM temp ORDER BY val DESC
     LIMIT (SELECT (count(*) / 10) AS selnum FROM temp )

It is not optimal (performance) but it works



来源:https://stackoverflow.com/questions/24626036/postgresql-how-do-i-select-top-n-percent-entries-from-each-group-category

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!