Say I have a simple table that has the following fields:
Large number of individual updates vs bulk delete/bulk insert is my scenario.I have historical sales data for multiple customers going back years. Until I get verified data (15th of the following month), I will adjust sales numbers every day to reflect the current state as obtained from another source (this means overwriting at most 45 days of sales each day for each customer). There may be no changes, or there may be a few changes. I can either code the logic to find the differences and update/delete/insert the affected records or I can just blow away yesterday's numbers and insert today's numbers. Clearly this latter approach is simpler, but if it's going to kill the table's performance due to churn, then it's worth it to write the extra logic to identify the handful (or none) of records that changed and only update/delete/insert those.
So, I'm replacing the records, and there may be some relationship between the old records and the new records, but in general I don't necessarily want to match the old data with the new data (that would be an extra step and would result in deletions, updates, and inserts). Also, relatively few fields would be changed (at most 7 out of 20 or 2 out of 15).
The records that are likely to be retrieved together will have been inserted at the same time and therefore should be physically close to each other. Does that make up for the performance loss due to churn with that approach, and is it better than the undo/redo cost of all those individual record updates?
One command on the same row should always be faster than two on that same row. So the UPDATE only would be better.
EDIT set up the table:
create table YourTable
(YourName varchar(50) primary key
,Tag int
)
insert into YourTable values ('first value',1)
run this, which takes 1 second on my system (sql server 2005):
SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
UPDATE YourTable set YourName='new name'
while @x<10000
begin
Set @x=@x+1
update YourTable set YourName='new name' where YourName='new name'
SET @y=@y+@@ROWCOUNT
end
print @y
run this, which took 2 seconds on my system:
SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
while @x<10000
begin
Set @x=@x+1
DELETE YourTable WHERE YourName='new name'
insert into YourTable values ('new name',1)
SET @y=@y+@@ROWCOUNT
end
print @y
In specific cases, Delete+Insert would save you time. I have a table that has 30000 odd rows and there is a daily update/insert of these records using a data file. The upload process generates 95% of update statements as the records are already there and 5% of inserts for ones that do not exist. Alternatively, uploading the data file records into a temp table, deletion of the destination table for records in the temp table followed by insertion of the same from the temp table has shown 50% gain in time.
Keep in mind the actual fragmentation that occurs when DELETE+INSERT is issued opposed to a correctly implemented UPDATE will make great difference by time.
Thats why, for instance, REPLACE INTO that MySQL implements is discouraged as opposed to using the INSERT INTO ... ON DUPLICATE KEY UPDATE ... syntax.
It depends on the product. A product could be implemented that (under the covers) converts all UPDATEs into a (transactionally wrapped) DELETE and INSERT. Provided the results are consistent with the UPDATE semantics.
I'm not saying I'm aware of any product that does this, but it's perfectly legal.
Just tried updating 43 fields on a table with 44 fields, the remaining field was the primary clustered key.
The update took 8 seconds.
A Delete + Insert is faster than the minimum time interval that the "Client Statistics" reports via SQL Management Studio.
Peter
MS SQL 2008