Which DBMS's allow an order by of an attribute, that is not present in the select clause?

问题

Let's assume I have a table called Cars with 2 columns: CarName, BrandName

Now I want to execute this query:

select CarName
from Cars
order by BrandName

As you can see, I'd like to return a list, which is sorted by a column, that is not present in the select part of the query.

The basic (not optimized) execution sequence of sql commands is: from, where, group by, having, select, order by.

The occuring problem is, that BrandName isn't part of what is left after the select command has been executed.

I've searched for this in books, on google and on Stackoverflow, but so far I've only found several SO comments like "I know of database system that don't allow it, but I don't remeber which one".

So my questions are:
1) What do the standards SQL-92 or SQL99 say about this.
2) Which databases allow this query and which don't?

(Background: A couple of students asked this, and I want to give them the best answer possible)

EDIT:
- Successfully tested for Microsoft SQL Server 2012

回答1:

Your query is perfectly legal syntax, you can order by columns that are not present in the select.

Working Demo with MySQL
Working Demo with SQL Server
Working Demo with Postgresql
Working Demo with SQLite
Working Demo with Oracle

If you need the full specs about legal ordering, in the SQL Standard 2003 it has a long list of statements about what the order by should and shouldn't contain, (02-Foundation, page 415, section 7.13 <Query expression>, sub part 28). This confirms that your query is legal syntax.

I think your confusion could be arising from selecting, and/or ordering by columns not present in the group by, or ordering by columns not in the select when using distinct.

Both have the same fundamental problem, and MySQL is the only one to my knowledge that allows either.

The problem is this, that when using group by or distinct, any columns not contained in either are not needed, so it doesn't matter if they have multiple different values across rows because they are never needed. Imagine this simple data set:

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |
2   |    A    |    Z     |
3   |    B    |    Y     |

If you write:

SELECT  DISTINCT Column1
FROM    T;

You would get

 Column1 
---------
     A   
     B

If you then add ORDER BY Column2, which of the two column2's would your use to order A by, X or Z? It is not deterministic as to how to choose a value for column2.

The same applies to selecting columns not in the group by. To simplify things just imagine the first two rows of the previous table:

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |
2   |    A    |    Z     |

In MySQL you can write

SELECT  ID, Column1, Column2
FROM    T
GROUP BY Column1;

This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |

Is no more or less correct than

ID  | Column1 | Column2  |  
----|---------+----------|
2   |    A    |    Y     |

So what you are saying is give me one row for each distinct value of Column1, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY clause to influence the results, so for example the following query:

SELECT  ID, Column1, Column2
FROM    T
GROUP BY Column1
ORDER BY ID DESC;

Would ensure that you get the following result:

ID  | Column1 | Column2  |  
----|---------+----------|
2   |    A    |    Y     |

because of the ORDER BY ID DESC, however this is not true (as demonstrated here).

The MySQL documents state:

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.

So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-determistic.

The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependant on a column in the GROUP BY. From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip

15) If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list> , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS.

For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard - Example here):

SELECT  ID, Column1, Column2
FROM    T
GROUP BY ID;

Since ID is unique for each row, there can only be one value of Column1 for each ID, one value of Column2 there is no ambiguity about what to return for each row.

回答2:

There's no logical reason why any RDBMS wouldn't let you do this. The usual restriction relates to SELECT DISTINCT, or the presence of a GROUP BY clause.

Current list of RDBMS known to support this:

Microsoft SQL Server 2012
Oracle
PostgreSQL
MySQL
DB2

来源：https://stackoverflow.com/questions/20356656/which-dbmss-allow-an-order-by-of-an-attribute-that-is-not-present-in-the-selec

标签

mysql

sql-server

Oracle

db2

sql-order-by