How many times are the results of this common table expression evaluated?

有些话、适合烂在心里 提交于 2019-12-24 05:40:46

问题


I am trying to work out a bug we've found during our last iteration of testing. It involves a query which uses a common table expression. The main theme of the query is that it simulates a 'first' aggregate operation (get the first row for this grouping).

The problem is that the query seems to choose rows completely arbitrarily in some circumstances - multiple rows from the same group get returned, some groups simply get eliminated altogether. However, it always picks the correct number of rows.

I have created a minimal example to post here. There are clients and addresses, and a table which defines the relationships between them. This is a much simplified version of the actual query I'm looking at, but I believe it should have the same characteristics, and it is a good example to use to explain what I think is going wrong.

CREATE TABLE [Client] (ClientID int, Name varchar(20))
CREATE TABLE [Address] (AddressID int, Street varchar(20))
CREATE TABLE [ClientAddress] (ClientID int, AddressID int)

INSERT [Client] VALUES (1, 'Adam')
INSERT [Client] VALUES (2, 'Brian')
INSERT [Client] VALUES (3, 'Charles')
INSERT [Client] VALUES (4, 'Dean')
INSERT [Client] VALUES (5, 'Edward')
INSERT [Client] VALUES (6, 'Frank')
INSERT [Client] VALUES (7, 'Gene')
INSERT [Client] VALUES (8, 'Harry')

INSERT [Address] VALUES (1, 'Acorn Street')
INSERT [Address] VALUES (2, 'Birch Road')
INSERT [Address] VALUES (3, 'Cork Avenue')
INSERT [Address] VALUES (4, 'Derby Grove')
INSERT [Address] VALUES (5, 'Evergreen Drive')
INSERT [Address] VALUES (6, 'Fern Close')

INSERT [ClientAddress] VALUES (1, 1)
INSERT [ClientAddress] VALUES (1, 3)
INSERT [ClientAddress] VALUES (2, 2)
INSERT [ClientAddress] VALUES (2, 4)
INSERT [ClientAddress] VALUES (2, 6)
INSERT [ClientAddress] VALUES (3, 3)
INSERT [ClientAddress] VALUES (3, 5)
INSERT [ClientAddress] VALUES (3, 1)
INSERT [ClientAddress] VALUES (4, 4)
INSERT [ClientAddress] VALUES (4, 6)
INSERT [ClientAddress] VALUES (5, 1)
INSERT [ClientAddress] VALUES (6, 3)
INSERT [ClientAddress] VALUES (7, 2)
INSERT [ClientAddress] VALUES (8, 4)
INSERT [ClientAddress] VALUES (5, 6)
INSERT [ClientAddress] VALUES (6, 3)
INSERT [ClientAddress] VALUES (7, 5)
INSERT [ClientAddress] VALUES (8, 1)
INSERT [ClientAddress] VALUES (5, 4)
INSERT [ClientAddress] VALUES (6, 6)

;WITH [Stuff] ([ClientID], [Name], [Street], [RowNo]) AS
(
    SELECT
        [C].[ClientID],
        [C].[Name],
        [A].[Street],
        ROW_NUMBER() OVER (ORDER BY [A].[AddressID]) AS [RowNo]
    FROM
        [Client] [C] INNER JOIN
        [ClientAddress] [CA] ON
            [C].[ClientID] = [CA].[ClientID] INNER JOIN
        [Address] [A] ON
            [CA].[AddressID] = [A].[AddressID]
)
SELECT
    [CTE].[ClientID],
    [CTE].[Name],
    [CTE].[Street],
    [CTE].[RowNo]
FROM
    [Stuff] [CTE]
WHERE
    [CTE].[RowNo] IN (SELECT MIN([CTE2].[RowNo]) FROM [Stuff] [CTE2] GROUP BY [CTE2].[ClientID])
ORDER BY
    [CTE].[Name] ASC,
    [CTE].[Street] ASC

DROP TABLE [ClientAddress]
DROP TABLE [Address]
DROP TABLE [Client]

The query is designed to get all clients, and their first address (the address with the lowest ID). This appears to me that it should work.

I have a theory about why it sometimes will not work. The statement that follows the CTE refers to the CTE in two places. If the CTE is non-deterministic, and it gets run more than once, the result of the CTE may be different in the two places it's referenced.

In my example, the CTE's RowNo column uses ROW_NUMBER() with an order by clause that will potentially result in different orderings when run multiple times (we're ordering by address, the clients can be in any order depending on how the query is executed).

Because of this is it possible that CTE and CTE2 can contain different results? Or is the CTE only executed once and do I need to look elsewhere for the problem?


回答1:


It is not guaranteed in any way.

SQL Server is free to evaluate CTE each time it's accessed or cache the results, depending on the plan.

You may want to read this article:

  • Generating XML in subqueries

If your CTE is not deterministic, you will have to store its result in a temporary table or a table variable and use it instead of the CTE.

PostgreSQL, on the other hand, always evaluates CTEs only once, caching their results.



来源:https://stackoverflow.com/questions/3511353/how-many-times-are-the-results-of-this-common-table-expression-evaluated

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!