Inserting records into DB table in multithreaded environment

问题

I have this TSQL code that checks if the 'sadsadsad' exists and if not inserts it into the table.

if not exists(select id from [ua_subset_composite] where ua = 'sadsadsad')
  begin
    insert into [ua_subset_composite]
    select 'sadsadsad',1,null,null,null,null
  end

My concern is that in production where there will be multiple threads running concurrently, a situation may occur that a record will slip thru between the not exists select and the insert.

I don't want to add a unique constraint on the column and wondering if I can improve this SQL code so that it will guarantee the uniqueness

回答1:

One way to address this is to use a higher level of isolation (i.e. locking). You could wrap your entire statement in a transaction and use a stricter isolation level.

For example:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION

   <your code here>

COMMIT TRANSACTION

回答2:

You could implement a locking strategy on your database. You have the choice of pessimistic:

when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid deadlocks.

or optimistic:

where you read a record, take note of a version number and check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic. (i.e. hasn't been updated between when you check the version and write the record to the disk) and update the version in one hit.

If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.

source

回答3:

When you perform the select, place a updlock, holdlock on the range being selected:

begin transaction

if not exists(
    select id 
    from [ua_subset_composite] with (updlock, holdlock) 
    where ua = 'sadsadsad')
  begin
    insert into [ua_subset_composite]
    select 'sadsadsad',1,null,null,null,null
  end

commit

The holdlock, equivalent to the serializable isolation level, will have the following effect:

Statements cannot read data that has been modified but not yet committed by other transactions.

No other transactions can modify data that has been read by the current transaction until the current transaction completes.

Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.

Range locks are placed in the range of key values that match the search conditions of each statement executed in a transaction. This blocks other transactions from updating or inserting any rows that would qualify for any of the statements executed by the current transaction. This means that if any of the statements in a transaction are executed a second time, they will read the same set of rows. The range locks are held until the transaction completes. This is the most restrictive of the isolation levels because it locks entire ranges of keys and holds the locks until the transaction completes. Because concurrency is lower, use this option only when necessary.

The updlock is needed in addition to the holdlock... by adding the updlock we prevent a separate process from performing its own select with (updlock, holdlock) statement on the same range at the same time.

回答4:

This is what I have ended up doing

insert into [ua_subset_composite]  WITH (TABLOCKX) (ua, os)
select @r, 1 
where not exists (select 1 from [ua_subset_composite] nolock where ua = @r

To test the code I ran this code concurrently from multiple windows

declare @r nvarchar(30);
while(1=1)
begin

set @r =  convert(nvarchar(30),getdate(),21 )
insert into [ua_subset_test]  WITH (TABLOCKX) (ua, os)
select @r, 1 
where not exists (select 1 from [ua_subset_test] nolock where ua = @r

)
end

回答5:

Unfortunately, none of the above answers is correct. Beware of any "locking" solution that starts BEGIN TRAN SELECT. Yes, if the isolation level is SERIALIZABLE, SELECT creates locks that prevent other processes from updating the selected data. But what if no data are selected? What's to lock?

IOW, BEGIN TRAN sets up a race condition:

/* spid */
 /* 1 */   SELECT ... -- returns no rows
 /* 2 */   SELECT ... -- returns no rows
 /* 1 */   INSERT ... -- whew! 
 /* 2 */   INSERT ... -- error

To read before writing (say, to present data to the user), there's the special timestamp data type. In your case, though, it's just an insert. Use an atomic transaction, i.e. a single statement:

insert into [ua_subset_composite] (column1, column2)
values ('sadsadsad', 1)
where not exists (
    select 1 from ua_subset_composite 
    where column1 = 'sadsadsad'
)

The server guarantees the row either is or is not inserted. The locking is done for you, where it needs to be, for the shortest possible time, by people who know how. :-)

I don't want to add a unique constraint

Well, you probably should, you know. The above code will prevent trying to add a nonunique value, and avoid an error message. A unique constraint will prevent someone less careful from succeeding.

来源：https://stackoverflow.com/questions/14242926/inserting-records-into-db-table-in-multithreaded-environment

标签

sql

tsql