T-SQL GROUP BY: Best way to include other grouped columns

问题

I'm a MySQL user who is trying to port some things over to MS SQL Server.

I'm joining a couple of tables, and aggregating some of the columns via GROUP BY.

A simple example would be employees and projects:

select empID, fname, lname, title, dept, count(projectID)
from employees E left join projects P on E.empID = P.projLeader
group by empID

...that would work in MySQL, but MS SQL is stricter and requires that everything is either enclosed in an aggregate function or is part of the GROUP BY clause.

So, of course, in this simple example, I assume I could just include the extra columns in the group by clause. But the actual query I'm dealing with is pretty complicated, and includes a bunch of operations performed on some of the non-aggregated columns... i.e., it would get REALLY ugly to try to include all of them in the group by clause.

So is there a better way to do this?

回答1:

You can get it to work with something around these lines:

select e.empID, fname, lname, title, dept, projectIDCount
from
(
   select empID, count(projectID) as projectIDCount
   from employees E left join projects P on E.empID = P.projLeader
   group by empID
) idList
inner join employees e on idList.empID = e.empID

This way you avoid the extra group by operations, and you can get any data you want. Also you have a better chance to make good use of indexes on some scenarios (if you are not returning the full info), and can be better combined with paging.

回答2:

"it would get REALLY ugly to try to include all of them in the group by clause."

Yup - that's the only way to do it * - just copy and paste the non-aggregated columns into the group by clause, remove the aliases and that's as good as it gets...

*you could wrap it in a nested SELECT but that's probably just as ugly...

回答3:

MySQL is unusual - and technically not compliant with the SQL standard - in allowing you to omit items from the GROUP BY clause. In standard SQL, each non-aggregate column in the select-list must be listed in full in the GROUP BY clause (either by name or by ordinal number, but that is deprecated).

(Oh, although MySQL is unusual, it is nice that it allows the shorthand.)

回答4:

You do not need join in the subquery as it not necessary to make group by based on empID from employees - you can do it on projectLeader field from projects.

With the inner join (as I put) you'll get list of employees that have at least one project. If you want list of all employees just change it to left join

  select e.empID, e.fname, e.lname, e.title, e.dept, p.projectIDCount
    from employees e 
   inner join ( select projLeader, count(*) as projectIDCount
                  from projects
                 group by projLeader
              ) p on p.projLeader = e.empID

回答5:

A subquery in the select clause might also be suitable. It would work for the example given but might not for the actual complicated query you are dealing with.

select
        e.empID, fname, lname, title, dept
        , (select count(*) from projects p where p.projLeader = e.empId) as projectCount
from
   from employees E

来源：https://stackoverflow.com/questions/626788/t-sql-group-by-best-way-to-include-other-grouped-columns

标签

sql-server

tsql

group-by

aggregate