Producing n rows per group

问题

It is known that GROUP BY produces one row per group. I want to produce multiple rows per group. The particular use case is, for example, selecting two cheapest offerings for each item.

It is trivial for two or three elements in the group:

select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type)
   or price = (select min(price) from fruits as f where f.type = fruits.type
      and price > (select min(price) from fruits as f2 where f2.type = fruits.type));

(Select n rows per group in mysql)

But I am looking for a query that can show n rows per group, where n is arbitrarily large. In other words, a query that displays 5 rows per group should be convertible to a query that displays 7 rows per group by just replacing some constants in it.

I am not constrained to any DBMS, so I am interested in any solution that runs on any DBMS. It is fine if it uses some non-standard syntax.

回答1:

For any database that supports analytic functions\ window functions, this is relatively easy

select *
  from (select type, 
               variety, 
               price,
               rank() over ([partition by something]
                            order by price) rnk
          from fruits) rank_subquery
 where rnk <= 3

If you omit the [partition by something], you'll get the top three overall rows. If you want the top three for each type, you'd partition by type in your rank() function.

Depending on how you want to handle ties, you may want to use dense_rank() or row_number() rather than rank(). If two rows tie for first, using rank, the next row would have a rnk of 3 while it would have a rnk of 2 with dense_rank. In both cases, both tied rows would have a rnk of 1. row_number would arbitrarily give one of the two tied rows a rnk of 1 and the other a rnk of 2.

回答2:

To save anyone looking some time, at the time of this writing, apparently this won't work because https://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html.

I've never been a fan of correlated subqueries as most uses I saw for them could usually be written more simply, but I think this has changed by mind... a little. (This is for MySQL.)

SELECT `type`, `variety`, `price`
FROM `fruits` AS f2
WHERE `price` IN (
   SELECT DISTINCT `price` 
   FROM `fruits` AS f1 
   WHERE f1.type = f2.type
   ORDER BY `price` ASC
   LIMIT X
   )
;

Where X is the "arbitrary" value you wanted.

If you know how you want to limit further in cases of duplicate prices, and the data permits such limiting ...

SELECT `type`, `variety`, `price`
FROM `fruits` AS f2
WHERE (`price`, `other_identifying_criteria`) IN (
   SELECT DISTINCT `price`, `other_identifying_criteria`
   FROM `fruits` AS f1 
   WHERE f1.type = f2.type
   ORDER BY `price` ASC, `other_identifying_criteria` [ASC|DESC]
   LIMIT X
   )
;

回答3:

"greatest N per group problems" can easily be solved using window functions:

select type, variety, price
from (
  select type, variety, price,
         dense_rank() over (partition by type) order by price as rnk
  from fruits
) t
where rnk <= 5;

回答4:

Windows functions only work on SQL Server 2012 and above. Try this out:

SQL Server 2005 and Above Solution

DECLARE @yourTable TABLE(Category VARCHAR(50), SubCategory VARCHAR(50), price INT)
INSERT INTO @yourTable
VALUES  ('Meat','Steak',1),
        ('Meat','Chicken Wings',3),
        ('Meat','Lamb Chops',5);

DECLARE @n INT = 2;

SELECT DISTINCT Category,CA.SubCategory,CA.price
FROM @yourTable A
CROSS APPLY
(
    SELECT TOP (@n) SubCategory,price
    FROM @yourTable B
    WHERE A.Category = B.Category
    ORDER BY price DESC
) CA

Results in two highest priced subCategories per Category:

Category                  SubCategory               price
------------------------- ------------------------- -----------
Meat                      Chicken Wings             3
Meat                      Lamb Chops                5

来源：https://stackoverflow.com/questions/30130928/producing-n-rows-per-group

标签

sql

greatest-n-per-group