How can I force a subquery to perform as well as a #temp table?

后端 未结 4 2071
既然无缘
既然无缘 2020-12-17 08:21

I am re-iterating the question asked by Mongus Pong Why would using a temp table be faster than a nested query? which doesn\'t have an answer that works for me.

Mos

相关标签:
4条回答
  • 2020-12-17 08:54

    I do not believe there is a query hint that instructs the engine to spool each subquery in turn.

    There is the OPTION (FORCE ORDER) query hint which forces the engine to perform the JOINs in the order specified, which could potentially coax it into achieving that result in some instances. This hint will sometimes result in a more efficient plan for a complex query and the engine keeps insisting on a sub-optimal plan. Of course, the optimizer should usually be trusted to determine the best plan.

    Ideally there would be a query hint that would allow you to designate a CTE or subquery as "materialized" or "anonymous temp table", but there is not.

    0 讨论(0)
  • 2020-12-17 08:55

    There are a few possible explanations as to why you see this behavior. Some common ones are

    1. The subquery or CTE may be being repeatedly re-evaluated.
    2. Materialising partial results into a #temp table may force a more optimum join order for that part of the plan by removing some possible options from the equation.
    3. Materialising partial results into a #temp table may improve the rest of the plan by correcting poor cardinality estimates.

    The most reliable method is simply to use a #temp table and materialize it yourself.

    Failing that regarding point 1 see Provide a hint to force intermediate materialization of CTEs or derived tables. The use of TOP(large_number) ... ORDER BY can often encourage the result to be spooled rather than repeatedly re evaluated.

    Even if that works however there are no statistics on the spool.

    For points 2 and 3 you would need to analyse why you weren't getting the desired plan. Possibly rewriting the query to use sargable predicates, or updating statistics might get a better plan. Failing that you could try using query hints to get the desired plan.

    0 讨论(0)
  • 2020-12-17 09:03

    Another option (for future readers of this article) is to use a user-defined function. Multi-statement functions (as described in How to Share Data between Stored Procedures) appear to force the SQL Server to materialize the results of your subquery. In addition, they allow you to specify primary keys and indexes on the resulting table to help the query optimizer. This function can then be used in a select statement as part of your view. For example:

    CREATE FUNCTION SalesByStore (@storeid varchar(30))
       RETURNS @t TABLE (title varchar(80) NOT NULL PRIMARY KEY,
                         qty   smallint    NOT NULL)  AS
    BEGIN
       INSERT @t (title, qty)
          SELECT t.title, s.qty
          FROM   sales s
          JOIN   titles t ON t.title_id = s.title_id
          WHERE  s.stor_id = @storeid
       RETURN
    END
    
    CREATE VIEW SalesData As
    SELECT * FROM SalesByStore('6380')
    
    0 讨论(0)
  • 2020-12-17 09:10

    Having run into this problem, I found out that (in my case) SQL Server was evaluating the conditions in incorrect order, because I had an index that could be used (IDX_CreatedOn on TableFoo).

    SELECT bar.*
    FROM
        (SELECT * FROM TableFoo WHERE Deleted = 1) foo
        JOIN TableBar bar ON (bar.FooId = foo.Id)
    WHERE
    foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
    

    I managed to work around it by forcing the subquery to use another index (i.e. one that would be used when the subquery was executed without the parent query). In my case I switched to PK, which was meaningless for the query, but allowed the conditions from the subquery to be evaluated first.

    SELECT bar.*
    FROM
        (SELECT * FROM TableFoo WITH (INDEX([PK_Id]) WHERE Deleted = 1) foo
        JOIN TableBar bar ON (bar.FooId = foo.Id)
    WHERE
    foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
    

    Filtering by the Deleted column was really simple and filtering the few results by CreatedOn afterwards was even easier. I was able to figure it out by comparing the Actual Execution Plan of the subquery and the parent query.


    A more hacky solution (and not really recommended) is to force the subquery to get executed first by limiting the results using TOP, however this could lead to weird problems in the future if the results of the subquery exceed the limit (you could always set the limit to something ridiculous). Unfortunately TOP 100 PERCENT can't be used for this purpose since SQL Server just ignores it.

    0 讨论(0)
提交回复
热议问题