Group only certain rows with GROUP BY

问题

SCHEMA

I have the following set-up in MySQL database:

CREATE TABLE items (
  id SERIAL,
  name VARCHAR(100),
  group_id INT,
  price DECIMAL(10,2),
  KEY items_group_id_idx (group_id),
  PRIMARY KEY (id)
);

INSERT INTO items VALUES 
(1, 'Item A', NULL, 10),
(2, 'Item B', NULL, 20),
(3, 'Item C', NULL, 30),
(4, 'Item D', 1,    40),
(5, 'Item E', 2,    50),
(6, 'Item F', 2,    60),
(7, 'Item G', 2,    70);

PROBLEM

I need to select:

All items with group_id that has NULL value, and

One item from each group identified by group_id having the lowest price.

EXPECTED RESULTS

+----+--------+----------+-------+
| id | name   | group_id | price |
+----+--------+----------+-------+
|  1 | Item A |     NULL | 10.00 | 
|  2 | Item B |     NULL | 20.00 | 
|  3 | Item C |     NULL | 30.00 | 
|  4 | Item D |        1 | 40.00 | 
|  5 | Item E |        2 | 50.00 | 
+----+--------+----------+-------+

POSSIBLE SOLUTION 1: Two queries with UNION ALL

SELECT id, name, group_id, price FROM items
WHERE group_id IS NULL
UNION ALL
SELECT id, name, MIN(price) FROM items
WHERE group_id IS NOT NULL
GROUP BY group_id;

/* EXPLAIN */
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type  | table      | type | possible_keys      | key                | key_len | ref   | rows | Extra                                        |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
|  1 | PRIMARY      | items      | ref  | items_group_id_idx | items_group_id_idx | 5       | const |    3 | Using where                                  | 
|  2 | UNION        | items      | ALL  | items_group_id_idx | NULL               | NULL    | NULL  |    7 | Using where; Using temporary; Using filesort | 
| NULL | UNION RESULT | <union1,2> | ALL  | NULL               | NULL               | NULL    | NULL  | NULL |                                              | 
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+

However it is undesirable to have two queries since there will be more complex condition in WHERE clause and I would need to sort the final results.

POSSIBLE SOLUTION 2: GROUP BY on expression (reference)

SELECT id, name, group_id, MIN(price) FROM items
GROUP BY CASE WHEN group_id IS NOT NULL THEN group_id ELSE RAND() END;

/* EXPLAIN */
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | items | ALL  | NULL          | NULL | NULL    | NULL |    7 | Using temporary; Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+

Solution 2 seems to be faster and simple to use but I'm wondering whether there is a better approach in terms of performance.

UPDATE:

According to documentation referenced by @axiac, this query is illegal in SQL92 and earlier and may work in MySQL only.

回答1:

According to this answer by @axiac, better solution in terms of compatibility and performance is shown below.

It is also explained in SQL Antipatterns book, Chapter 15: Ambiguous Groups.

To improve performance, combined index is also added for (group_id, price, id).

SOLUTION

SELECT a.id, a.name, a.group_id, a.price
FROM items a
LEFT JOIN items b 
ON a.group_id = b.group_id 
AND (a.price > b.price OR (a.price = b.price and a.id > b.id))
WHERE b.price is NULL;

See explanation on how it works for more details.

By accident as a side-effect this query works in my case where I needed to include ALL records with group_id equals to NULL AND one item from each group with the lowest price.

RESULT

+----+--------+----------+-------+
| id | name   | group_id | price |
+----+--------+----------+-------+
|  1 | Item A |     NULL | 10.00 | 
|  2 | Item B |     NULL | 20.00 | 
|  3 | Item C |     NULL | 30.00 | 
|  4 | Item D |        1 | 40.00 | 
|  5 | Item E |        2 | 50.00 | 
+----+--------+----------+-------+

EXPLAIN

+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys                 | key                | key_len | ref                        | rows | Extra                    |
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
|  1 | SIMPLE      | a     | ALL  | NULL                          | NULL               | NULL    | NULL                       |    7 |                          | 
|  1 | SIMPLE      | b     | ref  | PRIMARY,id,items_group_id_idx | items_group_id_idx | 5       | agi_development.a.group_id |    1 | Using where; Using index | 
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+

回答2:

If group_id is always a positive value you can simplify it without GUID/RAND:

SELECT id, name, min(price) FROM items
GROUP BY COALESCE(group_id, -id); -- id is already unique

But both queries will not return a correct result if you change the order of Inserts, I'll add a Fiddle when it's working again...

Gordon's query should work as expected or you use an old trick to get another column for MIN: piggybacking.

You concat multiple columns as fixed length string, MIN column as #1 and apply the MIN on this string. In the next step you extract the columns again using matching SUBSTRING:

SELECT
   CASE WHEN grp > 0 THEN grp ELSE NULL END AS group_id
   ,CAST(SUBSTRING(x FROM 1 FOR 13) AS DECIMAL(10,2)) AS price
   ,SUBSTRING(x FROM 24) AS NAME
FROM
 (
   SELECT COALESCE(group_id, -id) AS grp
      -- results in a string like this
      -- '        50.00         5Item E'
      ,MIN(LPAD(CAST(price AS VARCHAR(13)),13) 
           || LPAD(CAST(id AS VARCHAR(10)),10)
           || NAME) AS x
   FROM items
   GROUP BY grp
 ) AS dt;

回答3:

You can do this using where conditions:

SQLFiddle Demo

select t.*
from t
where t.group_id is null or
      t.price = (select min(t2.price)
                 from t t2
                 where t2.group_id = t.group_id
                );

Note that this returns all rows with the minimum price, if there is more than one for a given group.

EDIT:

I believe the following fixes the problem of multiple rows:

select t.*
from t
where t.group_id is null or
      t.id = (select t2.id
              from t t2
              where t2.group_id = t.group_id
              order by t2.price asc
              limit 1
             );

Unfortunately, SQL Fiddle is not working for me right now, so I cannot test it.

来源：https://stackoverflow.com/questions/36010981/group-only-certain-rows-with-group-by

标签

mysql

sql

group-by

greatest-n-per-group