How should I handle “ranked x out of y” data in PostgreSQL?

左心房为你撑大大i 提交于 2019-12-01 05:10:45

If you want the rank, do something like

SELECT id,num,rank FROM (
  SELECT id,num,rank() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4

Or if you actually want the row number, use

SELECT id,num,row_number FROM (
  SELECT id,num,row_number() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4

They'll differ when you have equal values somewhere. There is also dense_rank() if you need that.

This requires PostgreSQL 8.4, of course.

Isn't it just this:

SELECT  *
FROM    mytable
ORDER BY
        col1
OFFSET X LIMIT 1

Or I am missing something?

Update:

If you want to show the rank, use this:

SELECT  mi.*, values[1] AS rank, values[2] AS total
FROM    (
        SELECT  (
                SELECT  ARRAY[SUM(((mi.col1, mi.ctid) < (mo.col1, mo.ctid))::INTEGER), COUNT(*)]
                FROM    mytable mi
                ) AS values
        FROM    mytable mo
        WHERE   mo.id = @myid
        ) q
The Chairman

ROW_NUMBER functionality in PostgreSQL is implemented via LIMIT n OFFSET skip.

EDIT: Since you are asking for ROW_NUMBER() instead of simple ranking: row_number() is introduced to PostgreSQL in version 8.4. So you might consider to update. Otherwise this workaround might be helpful.

Previous replies tackle the question "select all rows and get their rank" which is not what you want...

  • you have a row
  • you want to know its rank

Just do :

SELECT count(*) FROM table WHERE score > $1

Where $1 is the score of the row you just selected (I suppose you'd like to display it so you might select it...).

Or do :

SELECT a., (SELECT count() FROM table b WHERE score > b.score) AS rank FROM table AS a WHERE pk = ...

However, if you select a row which is ranked last, yes you will need to count all the rows which are ranked before it, so you'll need to scan the whole table, and it will be very slow.

Solution :

SELECT count(*) FROM (SELECT 1 FROM table WHERE score > $1 LIMIT 30)

You'll get precise ranking for the 30 best scores, and it will be fast. Who cares about the losers ?

OK, If you really do care about the losers, you'll need to make a histogram :

Suppose score can go from 0 to 100, and you have 1000000 losers with score < 80 and 10 winners with score > 80.

You make a histogram of how many rows have a score of X, it's a simple small table with 100 rows. Add a trigger to your main table to update the histogram.

Now if you want to rank a loser which has score X, his rank is sum( histo ) where histo_score > X.

Since your score probably isn't between 0 and 100, but (say) between 0 and 1000000000, you'll need to fudge it a bit, enlarge your histogram bins, for instance. so you only need 100 bins max, or use some log-histogram distribution function.

By the way postgres does this when you ANALYZE the table, so if you set statistics_target to 100 or 1000 on score, ANALYZE, and then run :

EXPLAIN SELECT * FROM table WHERE score > $1

you'll get a nice rowcount estimate.

Who needs exact answers ?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!