Is there a standard for SQL aggregate function calculation?

问题

Is there a standard on SQL implementaton for multiple calls to the same aggregate function in the same query?

For example, consider the following example, based on a popular example schema:

SELECT Customer,SUM(OrderPrice) FROM Orders
GROUP BY Customer
HAVING SUM(OrderPrice)>1000

Presumably, it takes computation time to calculate the value of SUM(OrderPrice). Is this cost incurred for each reference to the aggregate function, or is the result stored for a particular query?

Or, is there no standard for SQL engine implementation for this case?

回答1:

Although I have worked with many different DBMS, I will only show you the result of proving this on SQL Server. Consider this query, which even includes a CAST in the expression. Looking at the query plan, the expression sum(cast(number as bigint)) is only taken once, which is defined as DEFINE:([Expr1005]=SUM([Expr1006])).

set showplan_text on
select type, sum(cast(number as bigint))
from master..spt_values
group by type
having sum(cast(number as bigint)) > 100000

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  |--Filter(WHERE:([Expr1005]>(100000)))
       |--Hash Match(Aggregate, HASH:([Expr1004]), RESIDUAL:([Expr1004] = [Expr1004]) DEFINE:([Expr1005]=SUM([Expr1006])))
            |--Compute Scalar(DEFINE:([Expr1004]=CONVERT(nchar(3),[mssqlsystemresource].[sys].[spt_values].[type],0), [Expr1006]=CONVERT(bigint,[mssqlsystemresource].[sys].[spt_values].[number],0)))
                 |--Index Scan(OBJECT:([mssqlsystemresource].[sys].[spt_values].[ix2_spt_values_nu_nc]))

It may not be very obvious above, since it doesn't show the SELECT result, so I have added a *10 to the query below. Notice that it now includes one extra step DEFINE:([Expr1006]=[Expr1005]*(10)) (steps run bottom to top) which demonstrates that the new expression required it to perform an extra calculation. Yet, even this is optimized, as it doesn't recalculate the entire expression - merely, it is taking Expr1005 and multiplying that by 10!

set showplan_text on
select type, sum(cast(number as bigint))*10
from master..spt_values
group by type
having sum(cast(number as bigint)) > 100000

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  |--Compute Scalar(DEFINE:([Expr1006]=[Expr1005]*(10)))
       |--Filter(WHERE:([Expr1005]>(100000)))
            |--Hash Match(Aggregate, HASH:([Expr1004]), RESIDUAL:([Expr1004] = [Expr1004]) DEFINE:([Expr1005]=SUM([Expr1007])))
                 |--Compute Scalar(DEFINE:([Expr1004]=CONVERT(nchar(3),[mssqlsystemresource].[sys].[spt_values].[type],0), [Expr1007]=CONVERT(bigint,[mssqlsystemresource].[sys].[spt_values].[number],0)))
                      |--Index Scan(OBJECT:([mssqlsystemresource].[sys].[spt_values].[ix2_spt_values_nu_nc]))

This is very likely how all the other DBMS work as well, at least considering the major ones i.e. PostgreSQL, Sybase, Oracle, DB2, Firebird, MySQL.

来源：https://stackoverflow.com/questions/12876873/is-there-a-standard-for-sql-aggregate-function-calculation

标签

sql

performance

standards

aggregate-functions