I came across this CTE solution for concatenating row elements and I thought it's brilliant and I realized how powerful CTEs can be.
However, in order to use such a tool effectively I need to know how it works internally to build that mental image which is essential for beginners, like me, to use it in different scenarios.
So I tried to slow motion the process of the above snippet and here is the code
USE [NORTHWIND]
GO
/****** Object: Table [dbo].[Products2] Script Date: 10/18/2011 08:55:07 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF OBJECT_ID('Products2','U') IS NOT NULL DROP TABLE [Products2]
CREATE TABLE [dbo].[Products2](
[ProductID] [int] IDENTITY(1,1) NOT NULL,
[ProductName] [nvarchar](40) NOT NULL,
[SupplierID] [int] NULL,
[CategoryID] [int] NULL,
[QuantityPerUnit] [nvarchar](20) NULL,
[UnitPrice] [money] NULL,
[UnitsInStock] [smallint] NULL,
[UnitsOnOrder] [smallint] NULL,
[ReorderLevel] [smallint] NULL,
[Discontinued] [bit] NOT NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Products2] ON
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (1, N'vcbcbvcbvc', 1, 4, N'10 boxes x 20 bags', 18.0000, 39, 0, 10, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (2, N'Changassad', 1, 1, N'24 - 12 oz bottles', 19.0000, 17, 40, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (3, N'Aniseed Syrup', 1, 2, N'12 - 550 ml bottles', 10.0000, 13, 70, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (4, N'Chef Anton''s Cajun Seasoning', 2, 2, N'48 - 6 oz jars', 22.0000, 53, 0, 0, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (5, N'Chef Anton''s Gumbo Mix', 10, 2, N'36 boxes', 21.3500, 0, 0, 0, 1)
SET IDENTITY_INSERT [dbo].[Products2] OFF
GO
IF OBJECT_ID('DELAY_EXEC','FN') IS NOT NULL DROP FUNCTION DELAY_EXEC
GO
CREATE FUNCTION DELAY_EXEC() RETURNS DATETIME
AS
BEGIN
DECLARE @I INT=0
WHILE @I<99999
BEGIN
SELECT @I+=1
END
RETURN GETDATE()
END
GO
WITH CTE (EXEC_TIME, CategoryID, product_list, product_name, length)
AS (SELECT dbo.DELAY_EXEC(),
CategoryID,
CAST('' AS VARCHAR(8000)),
CAST('' AS VARCHAR(8000)),
0
FROM Northwind..Products2
GROUP BY CategoryID
UNION ALL
SELECT dbo.DELAY_EXEC(),
p.CategoryID,
CAST(product_list + CASE
WHEN length = 0 THEN ''
ELSE ', '
END + ProductName AS VARCHAR(8000)),
CAST(ProductName AS VARCHAR(8000)),
length + 1
FROM CTE c
INNER JOIN Northwind..Products2 p
ON c.CategoryID = p.CategoryID
WHERE p.ProductName > c.product_name)
SELECT *
FROM CTE
ORDER BY EXEC_TIME
--SELECT CategoryId, product_list
-- FROM ( SELECT CategoryId, product_list,
-- RANK() OVER ( PARTITION BY CategoryId ORDER BY length DESC )
-- FROM CTE ) D ( CategoryId, product_list, rank )
-- WHERE rank = 1 ;
The commented block is the desired output for the concatenation problem but it's not the question here.
I've added a column EXEC_TIME to know which row got added first. The output doesn’t look right to me for two reasons
I thought there would be a redundant data because of the condition
p.ProductName > c.product_namein another word the first part of the CTE the empty rows are always less then values in the Product2 table so each time it runs it should bring a new set of already added rows once again. Does this make any sense?The hierarchy of data is really weird the last item should be the longest and look what is the last item? An item with
length=1?
Any expert to the rescue? Thanks in advance.
Sample Results
EXEC_TIME CategoryID product_list product_name length
----------------------- ----------- ------------------------------------------------------------------- --------------------------------- -----------
2011-10-18 12:46:14.930 1 0
2011-10-18 12:46:14.990 2 0
2011-10-18 12:46:15.050 4 0
2011-10-18 12:46:15.107 4 vcbcbvcbvc vcbcbvcbvc 1
2011-10-18 12:46:15.167 2 Aniseed Syrup Aniseed Syrup 1
2011-10-18 12:46:15.223 2 Chef Anton's Cajun Seasoning Chef Anton's Cajun Seasoning 1
2011-10-18 12:46:15.280 2 Chef Anton's Gumbo Mix Chef Anton's Gumbo Mix 1
2011-10-18 12:46:15.340 2 Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mix Chef Anton's Gumbo Mix 2
2011-10-18 12:46:15.400 2 Aniseed Syrup, Chef Anton's Cajun Seasoning Chef Anton's Cajun Seasoning 2
2011-10-18 12:46:15.463 2 Aniseed Syrup, Chef Anton's Gumbo Mix Chef Anton's Gumbo Mix 2
2011-10-18 12:46:15.520 2 Aniseed Syrup, Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mi Chef Anton's Gumbo Mix 3
2011-10-18 12:46:15.580 1 Changassad Changassad 1
The page Recursive Queries Using Common Table Expressions describes the logic of CTEs:
The semantics of the recursive execution is as follows:
Split the CTE expression into anchor and recursive members.
Run the anchor member(s) creating the first invocation or base result set (T0).
Run the recursive member(s) with Ti as an input and Ti+1 as an output.
Repeat step 3 until an empty set is returned.
Return the result set. This is a UNION ALL of T0 to Tn.
However, that's only the logical flow. As always, with SQL, the server is free to reorder operations as it sees fit, if the result will be "the same", and the reordering is perceived to provide the results more efficiently.
The presence of your function with side effects (causing a delay, then returning GETDATE()) isn't something that would normally be considered when deciding whether to reorder operations.
One obvious way in which the query may be reordered is that it may decide to start working on result set Ti+1 before it has fully created result set Ti - it may be more efficient to do this than to fully construct Ti first, since the new rows are definitely already in memory and have been accessed recently.
This is an interesting question that helped me better understand recursive CTEs too.
If you look at the execution plan you will see that a spool is used and that it has the WITH STACK property set. Which means that rows are read in a stack-like manner (Last In First Out)
So first the anchor part runs
EXEC_TIME CategoryID product_list
----------------------- ----------- --------------
2011-10-18 12:46:14.930 1
2011-10-18 12:46:14.990 2
2011-10-18 12:46:15.050 4
Then 4 is processed as that is the last row added. The JOIN returns 1 row that is added to the spool then this newly added row is processed. In this case the Join returns nothing so there is nothing additional added to the spool and it moves on to processing the CategoryID = 2 row.
This returns 3 rows which are added to the spool
Aniseed Syrup
Chef Anton's Cajun Seasoning
Chef Anton's Gumbo Mix
then each of these rows are processed in turn in a similar LIFO fashion with any child rows added being dealt with first before processing can move on to the sibling rows. Hopefully you can see how this recursive logic explains your observed results but just in case you can't a C# simulation
using System;
using System.Collections.Generic;
using System.Linq;
namespace Foo
{
internal class Bar
{
private static void Main(string[] args)
{
var spool = new Stack<Tuple<int, string, string>>();
//Add anchor elements
AddRowToSpool(spool, new Tuple<int, string, string>(1, "", ""));
AddRowToSpool(spool, new Tuple<int, string, string>(2, "", ""));
AddRowToSpool(spool, new Tuple<int, string, string>(4, "", ""));
while (spool.Count > 0)
{
Tuple<int, string, string> lastRowAdded = spool.Pop();
AddChildRows(lastRowAdded, spool);
}
Console.ReadLine();
}
private static void AddRowToSpool(Stack<Tuple<int, string, string>> spool,
Tuple<int, string, string> row)
{
Console.WriteLine("CategoryId={0}, product_list = {1}",
row.Item1,
row.Item3);
spool.Push(row);
}
private static void AddChildRows(Tuple<int, string, string> lastRowAdded,
Stack<Tuple<int, string, string>> spool)
{
int categoryId = lastRowAdded.Item1;
string productName = lastRowAdded.Item2;
string productList = lastRowAdded.Item3;
string[] products;
switch (categoryId)
{
case 1:
products = new[] {"Changassad"};
break;
case 2:
products = new[]
{
"Aniseed Syrup",
"Chef Anton's Cajun Seasoning",
"Chef Anton's Gumbo Mix "
};
break;
case 4:
products = new[] {"vcbcbvcbvc"};
break;
default:
products = new string[] {};
break;
}
foreach (string product in products.Where(
product => string.Compare(productName, product) < 0))
{
string product_list = string.Format("{0}{1}{2}",
productList,
productList == "" ? "" : ",",
product);
AddRowToSpool(spool,
new Tuple<int, string, string>
(categoryId, product, product_list));
}
}
}
}
Returns
CategoryId=1, product_list =
CategoryId=2, product_list =
CategoryId=4, product_list =
CategoryId=4, product_list = vcbcbvcbvc
CategoryId=2, product_list = Aniseed Syrup
CategoryId=2, product_list = Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Chef Anton's Gumbo Mix
CategoryId=2, product_list = Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=1, product_list = Changassad
来源:https://stackoverflow.com/questions/7803640/how-cte-really-works