In SQL, is UPDATE always faster than DELETE+INSERT?

前端 未结 15 2168
梦谈多话
梦谈多话 2020-11-29 20:01

Say I have a simple table that has the following fields:

  1. ID: int, autoincremental (identity), primary key
  2. Name: varchar(50), unique, has unique index<
相关标签:
15条回答
  • 2020-11-29 20:37

    The bigger the table (number of and size of columns) the more expensive it becomes to delete and insert rather than update. Because you have to pay the price of UNDO and REDO. DELETEs consume more UNDO space than UPDATEs, and your REDO contains twice as many statements as are necessary.

    Besides, it is plain wrong from a business point of view. Consider how much harder it would be to understand a notional audit trail on that table.


    There are some scenarios involving bulk updates of all the rows in a table where it is faster to create a new table using CTAS from the old table (applying the update in the the projection of the SELECT clause), dropping the old table and renaming the new table. The side-effects are creating indexes, managing constraints and renewing privileges, but it is worth considering.

    0 讨论(0)
  • 2020-11-29 20:39

    I am afraid the body of your question is unrelated to title question.

    If to answer the title:

    In SQL, is UPDATE always faster than DELETE+INSERT?

    then answer is NO!

    Just google for

    • "Expensive direct update"* "sql server"
    • "deferred update"* "sql server"

    Such update(s) result in more costly (more processing) realization of update through insert+update than direct insert+update. These are the cases when

    • one updates the field with unique (or primary) key or
    • when the new data does not fit (is bigger) in the pre-update row space allocated (or even maximum row size),resulting in fragmentation,
    • etc.

    My fast (non-exhaustive) search, not pretending to be covering one, gave me [1], [2]

    [1]
    Update Operations
    (Sybase® SQL Server Performance and Tuning Guide
    Chapter 7: The SQL Server Query Optimizer)
    http://www.lcard.ru/~nail/sybase/perf/11500.htm
    [2]
    UPDATE Statements May be Replicated as DELETE/INSERT Pairs
    http://support.microsoft.com/kb/238254

    0 讨论(0)
  • 2020-11-29 20:40

    In your case, I believe the update will be faster.

    Remember indexes!

    You have defined a primary key, it will likely automatically become a clustered index (at least SQL Server does so). A cluster index means the records are physically laid on the disk according to the index. DELETE operation itself won't cause much trouble, even after one record goes away, the index stays correct. But when you INSERT a new record, the DB engine will have to put this record in the correct location which under circumstances will cause some "reshuffling" of the old records to "make place" for a new one. There where it will slow down the operation.

    An index (especially clustered) works best if the values are ever increasing, so the new records just get appended to the tail. Maybe you can add an extra INT IDENTITY column to become a clustered index, this will simplify insert operations.

    0 讨论(0)
  • 2020-11-29 20:42

    Delete + Insert is almost always faster because an Update has way more steps involved.

    Update:

    1. Look for the row using PK.
    2. Read the row from disk.
    3. Check for which values have changed
    4. Raise the onUpdate Trigger with populated :NEW and :OLD variables
    5. Write New variables to disk (The entire row)

      (This repeats for every row you're updating)

    Delete + Insert:

    1. Mark rows as deleted (Only in the PK).
    2. Insert new rows at the end of the table.
    3. Update PK Index with locations of new records.

      (This doesn't repeat, all can be perfomed in a single block of operation).

    Using Insert + Delete will fragment your File System, but not that fast. Doing a lazy optimization on the background will allways free unused blocks and pack the table altogether.

    0 讨论(0)
  • 2020-11-29 20:43

    The question of speed is irrelevant without a specific speed problem.

    If you are writing SQL code to make a change to an existing row, you UPDATE it. Anything else is incorrect.

    If you're going to break the rules of how code should work, then you'd better have a damn good, quantified reason for it, and not a vague idea of "This way is faster", when you don't have any idea what "faster" is.

    0 讨论(0)
  • 2020-11-29 20:44

    What if you have a few million rows. Each row starts with one piece of data, perhaps a client name. As you collect data for clients, their entries must be updated. Now, let's assume that the collection of client data is distributed across numerous other machines from which it is later collected and put into the database. If each client has unique information, then you would not be able to perform a bulk update; i.e., there is no where-clause criteria for you to use to update multiple clients in one shot. On the other hand, you could perform bulk inserts. So, the question might be better posed as follows: Is it better to perform millions of single updates, or is it better to compile them into large bulk deletes and inserts. In other words, instead of "update [table] set field=data where clientid=123" a milltion times, you do 'delete from [table] where clientid in ([all clients to be updated]);insert into [table] values (data for client1), (data for client2), etc'

    Is either choice better than the other, or are you screwed both ways?

    0 讨论(0)
提交回复
热议问题