Is ORDER BY and ROW_NUMBER() deterministic?

后端未结

关注

 4  763

遇见更好的自我 2020-12-19 07:01

I\'ve used SQL in couple databases engines from time to time several years but have little theoretical knowledge so my question could be very \"noobish\" for some of you. Bu

4条回答

挽巷 (楼主)

2020-12-19 07:40

I really love these types of questions since you can get into doing performance analysis.

First, lets create a sample [test] database with a [urls] table with a million random records.

See code below.

-- Switch databases
USE [master];
go

-- Create simple database
CREATE DATABASE [test];
go

-- Switch databases
USE [test];
go

-- Create simple table
CREATE TABLE [urls]
    (
      my_id INT IDENTITY(1, 1)
                PRIMARY KEY ,
      my_link VARCHAR(255) ,
      my_status VARCHAR(15)
    );
go

-- http://stackoverflow.com/questions/1393951/what-is-the-best-way-to-create-and-populate-a-numbers-table

-- Load table with 1M rows of data 
;
WITH    PASS0
          AS ( SELECT   1 AS C
               UNION ALL
               SELECT   1
             ),           --2 rows
        PASS1
          AS ( SELECT   1 AS C
               FROM     PASS0 AS A ,
                        PASS0 AS B
             ),  --4 rows
        PASS2
          AS ( SELECT   1 AS C
               FROM     PASS1 AS A ,
                        PASS1 AS B
             ),  --16 rows
        PASS3
          AS ( SELECT   1 AS C
               FROM     PASS2 AS A ,
                        PASS2 AS B
             ),  --256 rows
        PASS4
          AS ( SELECT   1 AS C
               FROM     PASS3 AS A ,
                        PASS3 AS B
             ),  --65536 rows
        PASS5
          AS ( SELECT   1 AS C
               FROM     PASS4 AS A ,
                        PASS4 AS B
             ),  --4,294,967,296 rows
        TALLY
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY C ) AS Number
               FROM     PASS5
             )
    INSERT  INTO urls
            ( my_link ,
              my_status
            )
            SELECT 
      -- top 10 search engines + me
                    CASE ( Number % 11 )
                      WHEN 0 THEN 'www.ask.com'
                      WHEN 1 THEN 'www.bing.com'
                      WHEN 2 THEN 'www.duckduckgo.com'
                      WHEN 3 THEN 'www.dogpile.com'
                      WHEN 4 THEN 'www.webopedia.com'
                      WHEN 5 THEN 'www.clusty.com'
                      WHEN 6 THEN 'www.archive.org'
                      WHEN 7 THEN 'www.mahalo.com'
                      WHEN 8 THEN 'www.google.com'
                      WHEN 9 THEN 'www.yahoo.com'
                      ELSE 'www.craftydba.com'
                    END AS my_link ,

      -- ratings scale
                    CASE ( Number % 5 )
                      WHEN 0 THEN 'poor'
                      WHEN 1 THEN 'fair'
                      WHEN 2 THEN 'good'
                      WHEN 3 THEN 'very good'
                      ELSE 'excellent'
                    END AS my_status
            FROM    TALLY AS T
            WHERE   Number <= 1000000
go

Second, we always want to clear the buffers and cache when doing performance analysis in our test environment. Also, we want to turn on statistics I/O and time to compare the results.

See code below.

-- Show time & i/o
SET STATISTICS TIME ON
SET STATISTICS IO ON
GO

-- Remove clean buffers & clear plan cache
CHECKPOINT 
DBCC DROPCLEANBUFFERS 
DBCC FREEPROCCACHE
GO

Third, we want to try the first TSQL statement. Look at the execution plan and capture the statistics.

-- Try 1
SELECT * FROM urls ORDER BY my_status

/*
Table 'urls'. Scan count 5, logical reads 4987, physical reads 1, read-ahead reads 4918, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3166 ms,  elapsed time = 8130 ms.
*/

enter image description here

Fourth, we want to try the second TSQL statement. Do not forget to clear the query plan cache and buffers. If you do not, the query takes less than 1 sec since most of the information is in memory. Look at the execution plan and capture the statistics.

-- Try 2
SELECT ROW_NUMBER() OVER (ORDER BY my_status) as my_rownum, * FROM urls

/*
Table 'urls'. Scan count 5, logical reads 4987, physical reads 1, read-ahead reads 4918, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3276 ms,  elapsed time = 8414 ms.
*/

enter image description here

Last but not least, here is the fun part, the performance analysis.

1 - We can see that the second plan is a super set of the first. So both plans scan the clustered index and sort the data. Parallelism is used to put the results together.

2 - The second plan / query needs to calculate the row number. It segments the data and calculates this scalar. Therefore, we end up with two more operators in the plan.

It is not surprising that the first plan runs in 8130 ms and the second plan runs in 8414 ms.

Always look at the query plan. Both estimated and actual. They tell you want the engine is planning to do and what it actually does.

In this example, two different TSQL statements come up with almost identical plans.

Sincerely

John

www.craftydba.com

0 讨论(0)

查看其它4个回答