The last UNION in SQL ignores existing INDEX

问题

This is a refined problem I have been struggling with for a couple of days now.

I have a static table that I create and build indexes on which I then create a stored procedure to run against. My issue is bizarre and I will do my best to explain it.

I run the same scripts to create and execute across 194 databases...the vast majority of which run very quickly...however on a handful of databases they run exceptionally slow.

Just so we are clear here are the INDEX:

CREATE UNIQUE CLUSTERED INDEX IX_ID
ON DC_DuplicateMatch(ID)
go
CREATE INDEX IX_LastName_FirstName
ON DC_DuplicateMatch(ID, LastName, FirstName)
GO
CREATE INDEX IX_LastName_PostalCode
ON DC_DuplicateMatch(ID, LastName, PostalCode)
GO
CREATE INDEX IX_LastName_YearBorn 
ON DC_DuplicateMatch(ID, LastName, YearBorn)
GO
CREATE INDEX IX_FirstName_PostalCode 
ON DC_DuplicateMatch(ID, FirstName, PostalCode)
GO
CREATE INDEX IX_FirstName_YearBorn 
ON DC_DuplicateMatch(ID, FirstName, YearBorn)
GO
CREATE INDEX IX_PostalCode_YearBorn 
ON DC_DuplicateMatch(ID, PostalCode, YearBorn)
GO

And here is the the Stored Procedure:

CREATE PROC dbo.DC_GetPotentialDuplicates
    @ID    int,   
    @FirstName  varchar(30),   
    @LastName   varchar(30),   
    @PostalCode varchar(10),
    @YearBorn   varchar(4)    
AS
SELECT *  
FROM    DC_DuplicateMatch WITH(INDEX(IX_LastName_FirstName))
WHERE   ID > @ID AND 
        (LastName   = @LastName   AND 
        FirstName  = @FirstName)
UNION            
SELECT  *
FROM    DC_DuplicateMatch WITH(INDEX(IX_LastName_PostalCode))
WHERE   ID > @ID AND 
        (LastName   = @LastName   AND 
        PostalCode = @PostalCode)
UNION            
SELECT  *
FROM    DC_DuplicateMatch WITH(INDEX(IX_LastName_YearBorn))
WHERE   ID > @ID AND 
        (LastName   = @LastName   AND 
        YearBorn   = @YearBorn)
UNION
SELECT  *
FROM    DC_DuplicateMatch WITH(INDEX(IX_FirstName_PostalCode))
WHERE   ID > @ID AND 
        (FirstName  = @FirstName  AND 
        PostalCode = @PostalCode)
UNION            
SELECT  *
FROM    DC_DuplicateMatch WITH(INDEX(IX_FirstName_YearBorn))
WHERE   ID > @ID AND 
        (FirstName  = @FirstName  AND 
        YearBorn   = @YearBorn)
UNION
SELECT  *
FROM    DC_DuplicateMatch WITH(INDEX(IX_PostalCode_YearBorn))
WHERE   ID > @ID AND 
        (PostalCode = @PostalCode AND 
        YearBorn   = @YearBorn)
GO

Table Definition

ID          int no  4  10   0   
FirstName   varchar no  30              
LastName    varchar no  30          
PostalCode  char    no  10          
YearBorn    varchar no  4

This proc consistently runs faster on larger tables...smaller tables "occasionally" run slower. Speeds range from 4,000-records/second being "fast" down to 70-records/second being "slow".

The thing is if I add in blank filler records to the target table at some point, without any other changes, the speed increases from 70 upwards closer to the 4,000 mark. It's as if the query plan is not not being built properly based on the number of records in the table.

After running both the Database Engine Tuner and Performance Monitor I have discovered the problem is SQL ignores the INDEX on the last UNION and does a table scan on some queries. And the Execution Plan specifically says I need to create the exact index I already have on the table (hence the INDEX Hints).

So I removed the 6th select UNION

The issue remained; however this time it complained that the missing index was again on the last table of the unions (which is the 5th listed above and worked without issue when there where 6-select UNIONS).

Any thoughts on why this is happening or what I can do to avoid it? (aside from adding in dummy blank records to increase table size or creating a fake 7th final union...both of which increase performance).

回答1:

The existing indexes aren't that good because ID is the leading column. That makes them unseekable in this case. Here's a better set:

CREATE INDEX IX_LastName_FirstName
ON DC_DuplicateMatch(LastName, FirstName, ID) INCLUDE (PostalCode, YearBorn)
GO
CREATE INDEX IX_LastName_PostalCode
ON DC_DuplicateMatch(LastName, PostalCode, ID) INCLUDE (FirstName, YearBorn)
GO
CREATE INDEX IX_LastName_YearBorn 
ON DC_DuplicateMatch(LastName, YearBorn, ID) INCLUDE (FirstName, PostalCode)
GO
CREATE INDEX IX_FirstName_PostalCode 
ON DC_DuplicateMatch(FirstName, PostalCode, ID) INCLUDE (LastName, YearBorn)
GO
CREATE INDEX IX_FirstName_YearBorn 
ON DC_DuplicateMatch(FirstName, YearBorn, ID) INCLUDE (LastName, PostalCode)
GO
CREATE INDEX IX_PostalCode_YearBorn 
ON DC_DuplicateMatch(PostalCode, YearBorn, ID) INCLUDE (LastName, FirstName)
GO

These are perfect for this query.

That's 6 index seeks per query. We'd need to pull some serious tricks to beat this.

回答2:

In my experience, most of the time when the query planner doesn't use an appropriate index its because update statistics needs to be rerun to have the optimizer work correctly. This seems symptomatic of the fact that your query plan is bad for just some values.

Since you're providing an index hint though, this seems unlikely- normally a hint would force the planner to use that index, but maybe that depends on your vendor.

来源：https://stackoverflow.com/questions/20408789/the-last-union-in-sql-ignores-existing-index

标签

sql

stored-procedures

indexing

query-optimization

union