问题
Possible Duplicate:
SQL: Find the max record per group
I have a table with four columns as such:
name major minor revision
p1 0 4 3
p1 1 0 0
p1 1 1 4
p2 1 1 1
p2 2 5 0
p3 3 4 4
This is basically ca table containing records for each version of a program. I want to do a select to get all of the programs and their latest version so the results would look like this:
name major minor revision
p1 1 1 4
p2 2 5 0
p3 3 4 4
I can't just group by the name and get the max of each column because then i would just end up with the highest number from each column, but not the specific row with the highest version. How can I set this up?
回答1:
The way I try to solve SQL problems is to take things step by step.
- You want the maximum revision for the maximum minor version corresponding to the maximum major version for each product.
The maximum major number for each product is given by:
SELECT Name, MAX(major) AS Major FROM CA GROUP BY Name;
The maximum minor number corresponding to the maximum major number for each product is therefore given by:
SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
FROM CA
JOIN (SELECT Name, MAX(Major) AS Major
FROM CA
GROUP BY Name
) AS CB
ON CA.Name = CB.Name AND CA.Major = CB.Major
GROUP BY CA.Name, CA.Major;
And the maximum revision (for the maximum minor version number corresponding to the maximum major number for each product), therefore, is given by:
SELECT CA.Name, CA.Major, CA.Minor, MAX(CA.Revision) AS Revision
FROM CA
JOIN (SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
FROM CA
JOIN (SELECT Name, MAX(Major) AS Major
FROM CA
GROUP BY Name
) AS CB
ON CA.Name = CB.Name AND CA.Major = CB.Major
GROUP BY CA.Name, CA.Major
) AS CC
ON CA.Name = CC.Name AND CA.Major = CC.Major AND CA.Minor = CC.Minor
GROUP BY CA.Name, CA.Major, CA.Minor;
Tested - it works and produces the same answer as Andomar's query does.
Performance
I created a bigger volume of data (11616 rows of data), and ran a benchmark timing of Andomar's query against mine - target DBMS was IBM Informix Dynamic Server (IDS) version 11.70.FC2 running on MacOS X 10.7.2. I used the first of Andomar's two queries since IDS does not support the comparison notation in the second one. I loaded the data, updated statistics, and ran the queries both with mine followed by Andomar's and with Andomar's followed by mine. I also recorded the basic costs reported by the IDS optimizer. The result data from both queries were the same (so the queries are both accurate - or equally inaccurate).
Table unindexed:
Andomar's query Jonathan's query
Time: 22.074129 Time: 0.085803
Estimated Cost: 2468070 Estimated Cost: 22673
Estimated # of Rows Returned: 5808 Estimated # of Rows Returned: 132
Temporary Files Required For: Order By Temporary Files Required For: Group By
Table with unique index on (name, major, minor, revision):
Andomar's query Jonathan's query
Time: 0.768309 Time: 0.060380
Estimated Cost: 31754 Estimated Cost: 2329
Estimated # of Rows Returned: 5808 Estimated # of Rows Returned: 139
Temporary Files Required For: Group By
As you can seen, the index dramatically improves the performance of Andomar's query, but it still seems to be more expensive on this system than my query. The index gives a 25% time saving for my query. I'd be curious to see comparable figures for the two versions of Andomar's query on comparable volumes of data, with and without the index. (My test data can be supplied if you need it; there were 132 products - the 3 listed in the question and 129 new ones; each new product had (the same) 90 version entries.)
The reason for the discrepancy is that the sub-query in Andomar's query is a correlated sub-query, which is a relatively expensive process (dramatically so when the index is missing).
回答2:
You can use a not exists
subquery to filter out older records:
select *
from YourTable yt
where not exists
(
select *
from YourTable older
where yt.name = older.name and
(
yt.major < older.major or
yt.major = older.major and yt.minor < older.minor or
yt.major = older.major and yt.minor = older.minor and
yt.revision < older.revision
)
)
which can also be written in MySQL as:
select *
from YourTable yt
where not exists
(
select *
from YourTable older
where yt.name = older.name and
(yt.major, yt.minor, yt.revision)
< (older.major, older.major, older.revision)
)
回答3:
SELECT cam.*
FROM
( SELECT DISTINCT name
FROM ca
) AS cadistinct
JOIN
ca AS cam
ON ( cam.name, cam.major, cam.minor, cam.revision )
= ( SELECT name, major, minor, revision
FROM ca
WHERE name = cadistinct.name
ORDER BY major DESC
, minor DESC
, revision DESC
LIMIT 1
)
This will work in MySQL (current versions) but I woudn't recommend it:
SELECT *
FROM
( SELECT name, major, minor, revision
FROM ca
ORDER BY name
, major DESC
, minor DESC
, revision DESC
) AS tmp
GROUP BY name
回答4:
Update3 variable group_concat_max_len has a minvalue = 4 so we can't use it. But you can:
select
name,
SUBSTRING_INDEX(group_concat(major order by major desc),',', 1) as major,
SUBSTRING_INDEX(group_concat(minor order by major desc, minor desc),',', 1)as minor,
SUBSTRING_INDEX(group_concat(revision order by major desc, minor desc, revision desc),',', 1) as revision
from your_table
group by name;
this was tested here and no, the previous version does not provide wrong results, it had only the problem with number of concatenated values.
回答5:
If there are numbers in those columns, you could come up with some kind of a formula that will be unique and well ordered for the major, minor, revision values. E.g. if the numbers are less than 10, you could just append them as strings, and compare them, like:
select name, major, minor, revision,
concat(major, minor, revision) as version
from versions
If they are numbers that will not be larger than 100, you could do something like:
select name, major, minor, revision,
(major * 10000 + minor * 100 + revision) as version
from versions
You could than just get the max
of version
grouped by name, like this:
select name, major, minor, revision
from (
select name, major, minor, revision,
(major * 10000 + minor * 100 + revision) as version
from versions) v1
where version = (select max (major * 10000 + minor * 100 + revision)
from versions v2
where v1.name = v2.name)
回答6:
It allows max three digits per part of version number. If you want to use more digits then add two zeros to major multiplication an one zero to minor multiplication for each digit (I hope it's clear).
select t.*
from yourTable t
join (
select name, max(major * 1000000 + minor * 1000 + revision) as ver
from yourTable
group by name
) t1 on t1.ver = (t.major * 1000000 + t.minor * 1000 + t.revision)
Result:
name major minor revision
p1 1 1 4
p2 2 5 0
p3 3 4 4
回答7:
Am I the only one thinking that the greatest version is the one with the highest revision?
So,
select a.name, a.major, a.minor, a.revision
from table a
where a.revision = (select max(b.revision) from table b where b.name = a.name)
来源:https://stackoverflow.com/questions/8723404/how-to-select-single-row-based-on-the-max-value-in-multiple-rows