问题
I want to make sure that the order of the result from subquery are preserved while using Union distinct. Please note that "union distinct" is required to filter on duplicates while doing the union.
For example:
select columnA1, columnA2 from tableA order by [columnA3] asc
union distinct
select columnB1, columnB2 from tableB
When I run this, I am expecting that the records ordered from subquery ( select columnA1
, columnA2
from tableA
sort by [columnA3]
asc) comes in first (as returned by order by columnA3
asc) followed by those from tableB
.
I am assuming that I cannot add another dummy column because that would make union distinct to not work. So, this won't work:
select column1, column2 from
( select column1, column2, 1 as ORD from tableA order by [columnA3] asc
union distinct
select column1, column2, 2 as ORD from tableB
) order by ORD
回答1:
Essentially, MySQL isn’t preserving the order of records from sub-query while using “Union distinct” construct. After a bit of research, I found that it works if we put in a limit clause or have nested queries. So, below are the two approaches:
Approach-1: Use Limit clause
select columnA1, columnA2 from tableA order by [columnA3] asc Limit 100000000
union distinct
select columnB1, columnB2 from tableB
I have tested this behavior using few datasets and it seems to work consistently. Also, there is a reference to this behavior in MySQL‘s documentation ( http://dev.mysql.com/doc/refman/5.1/en/union.html ): “Use of ORDER BY for individual SELECT statements implies nothing about the order in which the rows appear in the final result because UNION by default produces an unordered set of rows. Therefore, the use of ORDER BY in this context is typically in conjunction with LIMIT, so that it is used to determine the subset of the selected rows to retrieve for the SELECT, even though it does not necessarily affect the order of those rows in the final UNION result. If ORDER BY appears without LIMIT in a SELECT, it is optimized away because it will have no effect anyway.”
Please note that there is no particular reason in choosing LIMIT of 10000000000 other than having a sufficiently high number to make sure we cover all cases.
Approach-2: A nested query like the one below also works.
select column1, column2 from
( select column1, column2 order by [columnA3] asc ) alias1
union distinct
( select column1, column2 from tableB )
I couldn’t find a reason for nested query to work. There have being some references online (like the one from Phil McCarley at http://dev.mysql.com/doc/refman/5.0/en/union.html ) but no official documentation from MySQL.
回答2:
select column1, column2 from
( select column1, column2, 1 as ORD from tableA
union distinct
select tableB.column1, tableB.column2, 2 as ORD from tableB
LEFT JOIN tableA
ON tableA.column1 = tableB.column1 AND tableA.column2 = tableB.column2
WHERE tableA.column1 IS NULL
) order by ORD
note that UNION not only de-dupes across the separate sets, but within sets
Alternatively:
select column1, column2 from
( select column1, column2, 1 as ORD from tableA
union distinct
select column1, column2, 2 as ORD from tableB
WHERE (column1, column2) NOT IN (SELECT column1, column2 from tableA)
) order by ORD
来源:https://stackoverflow.com/questions/7560091/preserving-the-order-of-records-from-subquery-while-using-union-distinct-const