问题
I'm trying to build a query that only gets the latest record on a group of records based on date.
The layout of the table is following:
| date | category | action | label | label2 | count_today | count_total | period |
The primary key is based on the columns date
, category
, action
, label
, label2
, period
. date
has format yyyy-mm-dd
and period can have the values Day
, Week
, month
.
For each unique combination of category | action | label | label2
I need to have the record with the latest date.
My first attempt at this was this:
SELECT * FROM `statistic`
WHERE
(action='total' OR action='' OR category='user')
AND
(period='day'
OR (period='week' AND DATEDIFF(now(), `date`) > 30)
OR (period = 'Month' AND DATEDIFF(now(), `date`) > 7*26)
)
GROUP BY category, action, label, label2
ORDER BY date DESC
The problem with this query is that it does the GROUP BY before the ORDER BY, causing incorrect records to be returned.
After searching, I found that what I want is called a group-wise maximum query.
My next attempt was this:
SELECT s1.* FROM `statistic` AS s1
LEFT JOIN statistic AS s2
ON
s1.category = s2.category
AND s1.action = s2.action
AND s1.label = s2.label
AND s1.label2 = s2.label2
AND s1.date > s2.date
WHERE
(s1.action='total' OR s1.action='' OR s1.category='user')
AND
(s1.period='day'
OR (s1.period='week' AND DATEDIFF(now(), s1.`date`) > 30)
OR (s1.period = 'Month' AND DATEDIFF(now(), s1.`date`) > 7*26)
)
GROUP BY category, action, label, label2
But this query doesn't give me the correct results either (it looks similar to the first query).
Any clue how I can get the data that I need?
回答1:
You are right that you want the group-wise maximum, but you accomplish this by joining your table with a subquery that finds the latest date for each group:
SELECT * FROM statistic NATURAL JOIN (
SELECT category, action, label, label2, MAX(date) date
FROM statistic
GROUP BY category, action, label, label2
) t
And then, if the following filters are still required:
WHERE
(action='total' OR action='' OR category='user')
AND
(period='day'
OR (period='week' AND DATEDIFF(now(), `date`) > 30)
OR (period = 'Month' AND DATEDIFF(now(), `date`) > 7*26)
)
来源:https://stackoverflow.com/questions/13451472/groupwise-maximum-query-for-getting-records-for-latest-date