selecting top N rows for each group in a table

后端 未结 3 1173
醉酒成梦
醉酒成梦 2020-11-30 04:03

I am facing a very common issue regarding \"Selecting top N rows for each group in a table\".

Consider a table with id, name, hair_colour, score columns

3条回答
  •  借酒劲吻你
    2020-11-30 04:48

    The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like

    a.name | a.score | b.name  | b.score
    -------+---------+---------+--------
    Sarah  | 9       | Sarah   | 9
    Sarah  | 9       | Deborah | 9
    

    and similarly for Deborah, which is why both girls get a rank of 2 here.

    The problem is that when there's a tie, all girls take the lowest value in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:

    Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:

    SELECT a.id, COUNT(*) + 1 AS ranknum
    FROM girl AS a
      INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
    GROUP BY a.id
    HAVING COUNT(*) <= 3
    

    Can anyone see any problems with this approach that have escaped my notice?

提交回复
热议问题