Why does my SQL Server UPSERT code sometimes not block?

问题

I have a table ImportSourceMetadata which I use to control an import batch process. It contains a PK column SourceId and a data column LastCheckpoint. The import batch process reads the LastCheckpoint for a given SourceId, performs some logic (on other tables), then updates the LastCheckpoint for that SourceId or inserts it if it doesn't exist yet.

Multiple instances of the process run at the same time, usually with disjunct SourceIds, and I need high parallelity for those cases. However, it can happen that two processes are started for the same SourceId; in that case, I need the instances to block each other.

Therefore, my code looks as follows:

BEGIN TRAN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT LastCheckpoint FROM ImportSourceMetadata WITH (UPDLOCK) WHERE SourceId = 'Source'

-- Perform some processing

-- UPSERT: if the SELECT above yielded no value, then
INSERT INTO ImportSourceMetadata(SourceId, LastCheckpoint) VALUES ('Source', '2013-12-21')
-- otherwise, we'd do this: UPDATE ImportSourceMetadata SET LastCheckpoint = '2013-12-21' WHERE SourceId = 'Source'

COMMIT TRAN

I'm using the transaction to achieve atomicity, but I can only use READ COMMITTED isolation level (because of the parallelity requirements in the "Perform some processing" block). Therefore (and to avoid deadlocks), I'm including an UPDLOCK hint with the SELECT statement to achieve a "critical section" parameterized on the SourceIdvalue.

Now, this works quite well most of the time, but I've managed to trigger primary key violation errors with the INSERT statement when starting a lot of parallel processes for the same SourceIdwith an empty database. I cannot reliably reproduce this, however, and I don't understand why it doesn't work.

I've found hints on the internet (e.g., here and here, in a comment) that I need to specify WITH (UPDLOCK,HOLDLOCK) (resp. WITH (UPDLOCK,SERIALIZABLE)) rather than just taking an UPDLOCK on the SELECT, but I don't really understand why that is. MSDN docs say,

UPDLOCK
Specifies that update locks are to be taken and held until the transaction completes.

An update lock that is taken and held until the transaction completes should be enough to block a subsequent INSERT, and in fact, when I try it out in SQL Server Management Studio, it does indeed block my insert. However, in some rare cases, it seems to suddenly not work any more.

So, why exactly is it that UPDLOCK is not enough, and why is it enough in 99% of my test runs (and when simulating it in SQL Server Management Studio)?

Update: I've now found I can reproduce the non-blocking behavior reliably by executing the code above in two different windows of SQL Server Management Studio simultaneously up to just before the INSERT, but only the first time after creating the database. After that (even though I deleted the contents of the ImportSourceMetadata table), the SELECT WITH (UPDLOCK) will indeed block and the code no longer fails. Indeed, in sys.dm_tran_locks, I can see a U-lock taken even though the row does not exist on subsequent test runs, but not on the first run after creating the table.

This is a complete sample to show the difference in locks between a "newly created table" and an "old table":

DROP TABLE ImportSourceMetadata
CREATE TABLE ImportSourceMetadata(SourceId nvarchar(50) PRIMARY KEY, LastCheckpoint datetime)

BEGIN TRAN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT LastCheckpoint FROM ImportSourceMetadata WITH (UPDLOCK) WHERE SourceId='Source'

SELECT * 
FROM sys.dm_tran_locks l 
JOIN sys.partitions p 
ON l.resource_associated_entity_id = p.hobt_id JOIN sys.objects o 
ON p.object_id = o.object_id

INSERT INTO ImportSourceMetadata VALUES('Source', '2013-12-21')
ROLLBACK TRAN

BEGIN TRAN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT LastCheckpoint FROM ImportSourceMetadata WITH (UPDLOCK) WHERE SourceId='Source'

SELECT * 
FROM sys.dm_tran_locks l 
JOIN sys.partitions p 
ON l.resource_associated_entity_id = p.hobt_id JOIN sys.objects o 
ON p.object_id = o.object_id

ROLLBACK TRAN

On my system (with SQL Server 2012), the first query shows no locks on ImportSourceMetadata, but the second query shows a KEY lock on ImportSourceMetadata.

In other words, HOLDLOCK is indeed required, but only if the table was freshly created. Why's that?

回答1:

You also need HOLDLOCK.

If the row does exist then your SELECT statement will take out a U lock on at least that row and retain it until the end of the transaction.

If the row doesn't exist there is no row to take and hold a U lock in so you aren't locking anything. HOLDLOCK will lock at least the range where the row would fit in.

Without HOLDLOCK two concurrent transactions can both do the SELECT for a non existent row. Retain no conflicting locks and both move onto the INSERT.

Regarding the repro in your question it seems the "row doesn't exist" issue is a bit more complex than I first thought.

If the row previously did exist but has since been logically deleted but still physically exists on the page as a "ghost" record then the U lock can still be taken out on the ghost explaining the blocking that you are seeing.

You can use DBCC PAGE to see ghost records as in this slight amend to your code.

SET NOCOUNT ON;

DROP TABLE ImportSourceMetadata

CREATE TABLE ImportSourceMetadata
  (
     SourceId       NVARCHAR(50),
     LastCheckpoint DATETIME,
     PRIMARY KEY(SourceId)
  )

BEGIN TRAN

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

SELECT LastCheckpoint
FROM   ImportSourceMetadata WITH (UPDLOCK)
WHERE  SourceId = 'Source'

INSERT INTO ImportSourceMetadata
VALUES      ('Source',  '2013-12-21')

DECLARE @DBCCPAGE NVARCHAR(100)

SELECT TOP 1 @DBCCPAGE = 'DBCC PAGE(0,' + CAST(file_id AS VARCHAR) + ',' + CAST(page_id AS VARCHAR) + ',3) WITH NO_INFOMSGS'
FROM   ImportSourceMetadata
       CROSS APPLY  sys.fn_physloccracker(%%physloc%%)

ROLLBACK TRAN

DBCC TRACEON(3604)

EXEC (@DBCCPAGE)

DBCC TRACEOFF(3604)

The SSMS messages tab shows

Slot 0 Offset 0x60 Length 31

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 31                     
Memory Dump @0x000000001215A060

0000000000000000:   3c000c00 00000000 9ba20000 02000001 †<.......¢...... 
0000000000000010:   001f0053 006f0075 00720063 006500††††...S.o.u.r.c.e.  

Slot 0 Column 1 Offset 0x13 Length 12 Length (physical) 12

来源：https://stackoverflow.com/questions/20720412/why-does-my-sql-server-upsert-code-sometimes-not-block

标签

sql-server

parallel-processing

isolation-level

upsert