What is the best practice for inserting a record if it doesn't already exist?

╄→尐↘猪︶ㄣ 提交于 2019-12-17 09:57:35

问题


I know at least three ways to insert a record if it doesn't already exist in a table:

  1. The first one is using if not exist:

    IF NOT EXISTS(select 1 from table where <condition>)
        INSERT...VALUES
    
  2. The second one is using merge:

    MERGE table AS target  
    USING (SELECT values) AS source 
    ON (condition)  
    WHEN NOT MATCHED THEN  
    INSERT ... VALUES ...
    
  3. The third one is using insert...select:

    INSERT INTO table (<values list>)
    SELECT <values list>
    WHERE NOT EXISTS(select 1 from table where <condition>)
    

But which one is the best?

The first option seems to be not thread-safe, as the record might be inserted between the select statement in the if and the insert statement that follows, if two or more users try to insert the same record.

As for the second option, merge seems to be an overkill for this, as the documentation states:

Performance Tip: The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it does not exist, or updating the row if it does match. When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.

So I think the third option is the best for this scenario (only insert the record if it doesn't already exist, no need to update if it does), but I would like to know what SQL Server experts think.

Please note that after the insert, I'm not interested to know whether the record was already there or whether it's a brand new record, I just need it to be there so that I can carry on with the rest of the stored procedure.


回答1:


When you need to guarantee the uniqueness of records on a condition that can not to be expressed by a UNIQUE or PRIMARY KEY constraint, you indeed need to make sure that the check for existence and insert are being done in one transaction. You can achieve this by either:

  1. Using one SQL statement performing the check and the insert (your third option)
  2. Using a transaction with the appropriate isolation level

There is a fourth way though that will help you better structure your code and also make it work in situations where you need to process a batch of records at once. You can create a TABLE variable or a temporary table, insert all of the records that need to be inserted in there and then write the INSERT, UPDATE and DELETE statements based on this variable.

Below is (pseudo)code demonstrating this approach:

-- Logic to create the data to be inserted if necessary

DECLARE @toInsert TABLE (idCol INT PRIMARY KEY,dataCol VARCHAR(MAX))
INSERT INTO @toInsert (idCol,dataCol) VALUES (1,'row 1'),(2,'row 2'),(3,'row 3')

-- Logic to insert the data

INSERT INTO realTable (idCol,dataCol)
SELECT TI.*
FROM @toInsert TI
WHERE NOT EXISTS (SELECT 1 FROM realTable RT WHERE RT.dataCol=TI.dataCol)

In many situations I use this approach as it makes the TSQL code easier to read, possible to refactor and apply unit tests to.




回答2:


Following Vladimir Baranov's comment, reading Dan Guzman's blog posts about Conditional INSERT/UPDATE Race Condition and “UPSERT” Race Condition With MERGE, seems like all three options suffers from the same drawbacks in a multi-user environment.

Eliminating the merge option as an overkill, we are left with options 1 and 3.

Dan's proposed solution is to use an explicit transaction and add lock hints to the select to avoid race condition.

This way, option 1 becomes:

BEGIN TRANSACTION
IF NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK) where <condition>)
BEGIN
    INSERT...VALUES
END
COMMIT TRANSACTION

and option 2 becomes:

BEGIN TRANSACTION
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK)where <condition>)
COMMIT TRANSACTION

Of course, in both options there need to be some error handling - every transaction should use a try...catch so that we can rollback the transaction in case of an error.

That being said, I think the 3rd option is probably my personal favorite, but I don't think there should be a difference.

Update

Following a conversation I've had with Aaron Bertrand in the comments of some other question - I'm not entirely convinced that using ISOLATION LEVEL is a better solution than individual query hints, but at least that's another option to consider:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>);
COMMIT TRANSACTION;


来源:https://stackoverflow.com/questions/38497259/what-is-the-best-practice-for-inserting-a-record-if-it-doesnt-already-exist

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!