Does ORDER BY apply before or after DISTINCT?

孤人 提交于 2019-12-01 17:43:29

问题


In a MySQL query, when using the DISTINCT option, does ORDER BY apply after the duplicates are removed? If not, is there any way to make it do so? I think it's causing some issues with my code.

EDIT:
Here's some more information about what's causing my problem. I understand that, at first glance, this order would not be important, since I am dealing with duplicate rows. However, this is not entirely the case, since I am using an INNER JOIN to sort the rows.

Say I have a table of forum threads, containing this data:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
+----+--------+-------------+

I also have a set of posts in another table like this:

+----+----------+--------+---------+
| id | threadid | userid | content |
+----+----------+--------+---------+
|  1 |        1 |      1 | Lorem   |
|  2 |        1 |      2 | Ipsum   |
|  3 |        2 |      2 | Test    |
|  4 |        3 |      1 | Foo     |
|  5 |        2 |      3 | Bar     |
|  6 |        3 |      5 | Bob     |
|  7 |        1 |      2 | Joe     |
+----+----------+--------+---------+

I am using the following MySQL query to get all threads, then sort them based on the latest post (assuming that posts with higher ids are more recent:

SELECT t.*
FROM Threads t
INNER JOIN Posts p ON t.id = p.threadid
ORDER BY p.id DESC

This works, and generates something like this:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
|  1 |      1 | Information |
+----+--------+-------------+

However, as you can see, the information is correct, but there are duplicate rows. I'd like to remove such duplicates, so I used SELECT DISTINCT instead. However, this yielded the following:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
+----+--------+-------------+

This is obviously wrong, since the "Information" thread should be on top. It would seem that using DISTINCT causes the duplicates to be removed from the top to the bottom, so only the final rows are left. This causes some issues in the sorting.

Is this the case, or am I analyzing things incorrectly?


回答1:


Two things to understand:

  1. Generally speaking, resultsets are unordered unless you specify an ORDER BY clause; to the extent that you specify a non-strict order (i.e. ORDER BY over non-unique columns), the order in which records that are equal under that ordering appear within the resultset is undefined.

    I suspect you may be specifying such a non-strict order, which is the root of your problems: ensure that your ordering is strict by specifying ORDER BY over a set of columns that is sufficient to uniquely identify each record for which you care about its final position in the resultset.

  2. DISTINCT may use GROUP BY, which causes the results to be ordered by the grouped columns; that is, SELECT DISTINCT a, b, c FROM t will produce a resultset that appears as though ORDER BY a, b, c has been applied. Again, specifying a sufficiently strict order to meet your needs will override this effect.


Following your update, bearing in mind my point #2 above, it is clear that the effect of grouping the results to achieve DISTINCT makes it impossible to then order by the non-grouped column p.id; instead, you want:

SELECT   t.*
FROM     Threads t INNER JOIN Posts p ON t.id = p.threadid
GROUP BY t.id
ORDER BY MAX(p.id) DESC



回答2:


DISTINCT informs MySQL how to build a rowset for you, ORDER BY gives a hint how this rowset should by presented. So the answer is: DISTINCT first, ORDER BY last.




回答3:


The order in which DISTINCT and ORDER BY are applied, in most cases, will not affect the final output.

However, if you also use GROUP BY, this will affect the final output. In this case, the ORDER BY is performed after the GROUP BY, which will return unexpected results (assuming you expect the sort to be performed before the grouping).



来源:https://stackoverflow.com/questions/10905026/does-order-by-apply-before-or-after-distinct

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!