Query insisting on columns in the 'Order By' clause match columns in the 'Group by' clause

问题

A bit of an odd one this one. I am trying to run the following query joining 3 tables.

Select t3.id, t3.name, t3.phone_no, t1.reg_dtm, count(t1.reg_dtm)
from tableA t1, tableB t2, tableC t3
Where t1.id = t2.id
And t2.id = t3.id
Group by t3.id, t3.name, t3.phone_no, t1.reg_dtm
Order by t2.id, t1.reg_dtm

The above query returns the following error

ORA-00979: not a GROUP BY expression

But if i change it so that everything in the group by clause is in the order by clause then it works.

Select t3.id, t3.name, t3.phone_no, t1.reg_dtm, count(t1.reg_dtm)
from tableA t1, tableB t2, tableC t3
Where t1.id = t2.id
And t2.id = t3.id
Group by t3.id, t3.name, t3.phone_no, t1.reg_dtm
Order by t3.id, t3.name, t3.phone_no, t1.reg_dtm

What exactly is the reason for this?

I think the problem is possibly because t2.id shown in the order by statement on the first query is not part of the group by statement. If this is the cause then why does it matter? I have never experienced this before and didn't think that there was any relationship between the group by and the order by statements.

I tested the above on Oracle 10G as well as MySQL.

Thanks in advance

回答1:

The ORDER BY clause runs after everything else in the SELECT statement executes; in a GROUPing scenario, the result set is limited to the columns used to aggregate the data. If you don't have a column specified in your initial resultset, the processing engine doesn't understand what to do with the requested output.

In other words, since your query doesn't return distinct values for t2.id and t1.id (since they're not used in the GROUP BY clause), the engine can't return the data in that order.

回答2:

The columns that are not mentioned in the GROUP BY clause technically should not be in the result set (unless they're aggregated, i.e. max(columnname))-- so how does it make sense to ORDER BY them? In other words, what would such a query even mean?

However, MySQL (I don't know about others) allows you to select columns that are not in the GROUP BY, which leads to lots of confusion for beginners wondering why they get strange query results.

As a sidenote, you may want to avoid implicit joins, as discussed here.

回答3:

In general, you cannot order by columns that are not in the GROUP BY clause and that are not aggregate functions in the SELECT list because the database has no way to deterministically sort the results. The database doesn't generally know how such a column would be aggregated in the final results so it doesn't know how to handle the case where one row in a result is, for example, the result of aggregating rows in the base table with values of 1 & 4 and another row in the result is the result of aggregating rows in the base table with values of 2 & 3. Either way it chooses to sort the results could be considered incorrect.

Now, in this particular case, since you're doing inner joins on ID between all three tables, the database could, in theory, be smart enough to recognize that ORDER BY t2.id, t1.id is semantically equivalent to ORDER BY t3.id, t3.id which can be evaluated deterministically. I don't know of any database, however, that has built this sort of query transformation into its optimizer. And the trade-off of potential bugs introduced when such an analysis goes wrong would tend to argue against including it when the queries you're trying to run don't make much logical sense from a set theory standpoint.

回答4:

You can order only by columns that are in the select clause - so an order by t3.ud, t1.reg_dtm should do it and would have the same semantic.

回答5:

Actually, if you read the SQL carefully, in the first statement you group by T3.ID, but sort by T2.ID.

If that's the real SQL that you're running, then your only problem is that you have a typo.

来源：https://stackoverflow.com/questions/8126058/query-insisting-on-columns-in-the-order-by-clause-match-columns-in-the-group

标签

mysql

sql

Oracle

oracle10g