JOIN, GROUP BY, ORDER BY

廉价感情. 提交于 2019-12-18 06:56:39

问题


The problem I first had with the following query was that the group by clause was performed before the order by:

The saved.recipe_id column is an integer generated by UNIX_TIMESTAMP()

SELECT
    saved.recipe_id,
    saved.`date`,
    user.user_id
FROM saved
    JOIN user
        ON user.id = saved.user_id
GROUP BY saved.recipe_id
ORDER BY saved.`date` DESC

So I tried all sorts of different possible solution with sub queries and other bs. In the end I ended up with trying out some different sub queries in the join clause witch required me to change the table order from the from clause to the join clause. I decided to just try the following out:

SELECT
    saved.recipe_id,
    saved.`date`,
    user.user_id
FROM user
    JOIN saved
        ON user.id = saved.user_id
GROUP BY saved.recipe_id
ORDER BY saved.`date` DESC

For some reason this seems to order correctly, but why?
How can this change make my query sort more correctly then before?
Does it really? or is it just happen to do it for the test cases I put it up against?


回答1:


So the problem I first had with the following query was that the group by clause was performed before the order by:

This is not a problem. This is how SQL is defined and how it operates. The group by creates a new set of rows and order by orders those rows.

There is no ordering issue here. There is an "understanding of SQL" issue. Your order by is only ordering the results of the query. These results are produced by the group by, and the order o fthe joins has nothing to do with the results.

You are using a MySQL extension called Hidden Columns. This is when you have an aggregation query that has columns in the select (or having or order by clauses) that are not part of aggregation functions (sum(), etc) or part of the group by. Here is a quote from the documentation:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.

Presumably, you want the most recent date and user associated with that. The following query does what you want correctly and consistently:

SELECT saved.recipe_id, max(saved.`date`) as MostRecentDate,
       substring_index(group_concat(user.user_id), ',', 1) as MostRecentUser
FROM user JOIN
     saved
     ON user.id = saved.user_id
GROUP BY saved.recipe_id
ORDER BY max(saved.`date`) DESC;



回答2:


From what I remember GROUP BY is always performed before ORDER BY. If you retrieve any column which is not in any aggregation function the result for that column will be random. Your correct order from second query is accidental.

Instead saved.date use MAX(saved.date)

Then you will get determined order from every singe group and then you will order that determined results.



来源:https://stackoverflow.com/questions/18762726/join-group-by-order-by

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!