问题
SCHEMA
I have the following set-up in MySQL database:
CREATE TABLE items (
id SERIAL,
name VARCHAR(100),
group_id INT,
price DECIMAL(10,2),
KEY items_group_id_idx (group_id),
PRIMARY KEY (id)
);
INSERT INTO items VALUES
(1, 'Item A', NULL, 10),
(2, 'Item B', NULL, 20),
(3, 'Item C', NULL, 30),
(4, 'Item D', 1, 40),
(5, 'Item E', 2, 50),
(6, 'Item F', 2, 60),
(7, 'Item G', 2, 70);
PROBLEM
I need to select:
- All items with
group_id
that hasNULL
value, and- One item from each group identified by
group_id
having the lowest price.
EXPECTED RESULTS
+----+--------+----------+-------+
| id | name | group_id | price |
+----+--------+----------+-------+
| 1 | Item A | NULL | 10.00 |
| 2 | Item B | NULL | 20.00 |
| 3 | Item C | NULL | 30.00 |
| 4 | Item D | 1 | 40.00 |
| 5 | Item E | 2 | 50.00 |
+----+--------+----------+-------+
POSSIBLE SOLUTION 1: Two queries with UNION ALL
SELECT id, name, group_id, price FROM items
WHERE group_id IS NULL
UNION ALL
SELECT id, name, MIN(price) FROM items
WHERE group_id IS NOT NULL
GROUP BY group_id;
/* EXPLAIN */
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | items | ref | items_group_id_idx | items_group_id_idx | 5 | const | 3 | Using where |
| 2 | UNION | items | ALL | items_group_id_idx | NULL | NULL | NULL | 7 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
However it is undesirable to have two queries since there will be more complex condition in WHERE
clause and I would need to sort the final results.
POSSIBLE SOLUTION 2: GROUP BY
on expression (reference)
SELECT id, name, group_id, MIN(price) FROM items
GROUP BY CASE WHEN group_id IS NOT NULL THEN group_id ELSE RAND() END;
/* EXPLAIN */
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | SIMPLE | items | ALL | NULL | NULL | NULL | NULL | 7 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
Solution 2 seems to be faster and simple to use but I'm wondering whether there is a better approach in terms of performance.
UPDATE:
According to documentation referenced by @axiac, this query is illegal in SQL92 and earlier and may work in MySQL only.
回答1:
According to this answer by @axiac, better solution in terms of compatibility and performance is shown below.
It is also explained in SQL Antipatterns book, Chapter 15: Ambiguous Groups.
To improve performance, combined index is also added for (group_id, price, id)
.
SOLUTION
SELECT a.id, a.name, a.group_id, a.price
FROM items a
LEFT JOIN items b
ON a.group_id = b.group_id
AND (a.price > b.price OR (a.price = b.price and a.id > b.id))
WHERE b.price is NULL;
See explanation on how it works for more details.
By accident as a side-effect this query works in my case where I needed to include ALL records with group_id
equals to NULL
AND one item from each group with the lowest price.
RESULT
+----+--------+----------+-------+
| id | name | group_id | price |
+----+--------+----------+-------+
| 1 | Item A | NULL | 10.00 |
| 2 | Item B | NULL | 20.00 |
| 3 | Item C | NULL | 30.00 |
| 4 | Item D | 1 | 40.00 |
| 5 | Item E | 2 | 50.00 |
+----+--------+----------+-------+
EXPLAIN
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 7 | |
| 1 | SIMPLE | b | ref | PRIMARY,id,items_group_id_idx | items_group_id_idx | 5 | agi_development.a.group_id | 1 | Using where; Using index |
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
回答2:
If group_id
is always a positive value you can simplify it without GUID
/RAND
:
SELECT id, name, min(price) FROM items
GROUP BY COALESCE(group_id, -id); -- id is already unique
But both queries will not return a correct result if you change the order of Inserts, I'll add a Fiddle when it's working again...
Gordon's query should work as expected or you use an old trick to get another column for MIN
: piggybacking.
You concat multiple columns as fixed length string, MIN
column as #1 and apply the MIN
on this string. In the next step you extract the columns again using matching SUBSTRING
:
SELECT
CASE WHEN grp > 0 THEN grp ELSE NULL END AS group_id
,CAST(SUBSTRING(x FROM 1 FOR 13) AS DECIMAL(10,2)) AS price
,SUBSTRING(x FROM 24) AS NAME
FROM
(
SELECT COALESCE(group_id, -id) AS grp
-- results in a string like this
-- ' 50.00 5Item E'
,MIN(LPAD(CAST(price AS VARCHAR(13)),13)
|| LPAD(CAST(id AS VARCHAR(10)),10)
|| NAME) AS x
FROM items
GROUP BY grp
) AS dt;
回答3:
You can do this using where
conditions:
SQLFiddle Demo
select t.*
from t
where t.group_id is null or
t.price = (select min(t2.price)
from t t2
where t2.group_id = t.group_id
);
Note that this returns all rows with the minimum price, if there is more than one for a given group.
EDIT:
I believe the following fixes the problem of multiple rows:
select t.*
from t
where t.group_id is null or
t.id = (select t2.id
from t t2
where t2.group_id = t.group_id
order by t2.price asc
limit 1
);
Unfortunately, SQL Fiddle is not working for me right now, so I cannot test it.
来源:https://stackoverflow.com/questions/36010981/group-only-certain-rows-with-group-by