可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a table of invoices being prepared, and then ready for printing.

[STATUS] column is Draft, Print, Printing, Printed

I need to get the ID of the first (FIFO) record to be printed, and change the record status. The operation must be threadsafe so that another process does not select the same InvoiceID

Can I do this (looks atomic to me, but maybe not ...):

1:

WITH CTE AS (     SELECT TOP(1) [InvoiceID], [Status]     FROM    INVOICES     WHERE   [Status] = 'Print'     ORDER BY [PrintRequestedDate], [InvoiceID]  ) UPDATE CTE SET [Status] = 'Printing'     , @InvoiceID = [InvoiceID]

... perform operations using @InvoiceID ...

UPDATE  INVOICES SET [Status] = 'Printed' WHERE   [InvoiceID] = @InvoiceID

or must I use this (for the first statement)

2:

UPDATE INVOICES SET    [Status] = 'Printing'     , @InvoiceID = [InvoiceID] WHERE  [InvoiceID] =  (     SELECT TOP 1 [InvoiceID]     FROM    INVOICES WITH (UPDLOCK)     WHERE   [Status] = 'Print'     ORDER BY [PrintRequestedDate], [InvoiceID]  )

... perform operations using @InvoiceID ... etc.

(I cannot hold a transaction open from changing status to "Printing" until the end of the process, i.e. when status is finally changed to "Printed").

EDIT:

In case it matters the DB is READ_COMMITTED_SNAPSHOT

I can hold a transaction for both UPDATE STATUS to "Printing" AND get the ID. But I cannot continue to keep transaction open all the way through to changing the status to "Printed". This is an SSRS report, and it makes several different queries to SQL to get various bits of the invoice, and it might crash/whatever, leaving the transaction open.

@Gordon Linoff "If you want a queue" The FIFO sequence is not critical, I would just like invoices that are requested first to be printed first ... "more or less" (don't want any unnecessary complexity ...)

@Martin Smith "looks like a usual table as queue requirement" - yes, exactly that, thanks for the very useful link.

SOLUTION:

The solution I am adopting is from comments:

@lad2025 pointed me to SQL Server Process Queue Race Condition which uses WITH (ROWLOCK, READPAST, UPDLOCK) and @MartinSmith explained what the Isolation issue is and pointed me at Using tables as Queues - which talks about exactly what I am trying to do.

I have not grasped why UPDATE TOP 1 is safe, and UPDATE MyTable SET xxx = yyy WHERE MyColumn = (SELECT TOP 1 SomeColumn FROM SomeTable ORDER BY AnotherColumn) (without Isolation Hints) is not, and I ought to educate myself, but I'm happy just to put the isolation hints in my code and get on with something else :)

Thanks for all the help.

回答1:

My concern would be duplicate [InvoiceID]
Multiple print requests for the same [InvoiceID]

On the first update ONE row gets set [Status] = 'Printing'

On the second update all [InvoiceID] rows get set [Status] = 'Printed'
This would even set rows with status = 'draft'

Maybe that is what you want

Another process could pick up the same [InvoiceID] before the set [Status] = 'Print'

So some duplicates will print and some will not

I go with comments on use the update lock

This is non-deterministic but you could just take top (1) and skip the order by. You will tend to get the most recent row but it is not guaranteed. If you clear the queue then you get em all.

This demonstrates you can lose 'draft' = 1

declare @invID int;  declare @T table (iden int identity primary key, invID int, status tinyint); insert into @T values (1, 2), (5, 1), (3, 1), (4, 1), (4, 2), (2, 1), (1, 1), (5, 2), (5, 2); declare @iden int; select * from @t order by iden;  declare @rowcount int = 1;  while (@ROWCOUNT > 0)     begin         update top (1) t          set t.status = 3, @invID = t.invID,  @iden = t.iden         from @t t          where t.status = '2';         set @rowcount = @@ROWCOUNT;         if(@rowcount > 0)             begin                  select @invID, @iden;                 -- do stuff                   update t                  set t.status = 4                 from @t t                 where t.invID = @invID; -- t.iden = @iden;                 select * from @T order by iden;             end     end

回答2:

ATOMicity of a Single Statement

I think your code's fine as it is. i.e. Because you have a single statement which updates the status to printing as soon as the statement runs the status is updated; so anything running before which searches for print would have updated the same record to printing before your process saw it; so your process would pick a subsequent record, or any process hitting it after your statement had run would see it as printing so would not pick it up`. There isn't really a scenario where a record could pick it up whilst the statement's running since as discussed a single SQL statement should be atomic.

Disclaimer

That said, I'm not enough of an expert to say whether explicit lock hints would help; in my opinion they're not needed as the above is atomic, but others in the comments are likely better informed than me. However, running a test (albeit where database and both threads are run on the same machine) I couldn't create a race condition... maybe if the clients were on different machines / if there was a lot more concurrency you'd be more likely to see issues.

My hope is that others have interpreted your question differently, hence the disagreement.

Attempt to Disprove Myself

Here's the code I used to try to cause a race condition; you can drop it into LINQPad 5, select language C# Program, tweak the connectionstring (and optionally any statements) as required, then run:

const long NoOfRecordsToTest = 1000000; const string ConnectionString = "Server=.;Database=Play;Trusted_Connection=True;";  //assumes a database called "play" const string DropFifoQueueTable = @"     if object_id('FIFOQueue') is not null      drop table FIFOQueue"; const string CreateFifoQueueTable = @"     create table FIFOQueue      (         Id bigint not null identity (1,1) primary key clustered         , Processed bit default (0) --0=queued, null=processing, 1=processed     )"; const string GenerateDummyData = @"     with cte as     (         select 1 x         union all         select x + 1         from cte         where x < @NoRowsToGenerate     )     insert FIFOQueue(processed)     select 0     from cte     option (maxrecursion 0)     "; const string GetNextFromQueue = @"     with singleRecord as     (         select top (1) Id, Processed         from FIFOQueue --with(updlock, rowlock, readpast) --optionally include this per comment discussions         where processed = 0         order by Id     )     update singleRecord     set processed = null     output inserted.Id"; //we don't really need this last bit for our demo; I've included in case the discussion turns to this.. const string MarkRecordProcessed = @"     update FIFOQueue     set Processed = 1     where Id = @Id"; void Main() {     SetupTestDatabase();      var task1 = Task<IList<long>>.Factory.StartNew(() => ExampleTaskForced(1));     var task2 = Task<IList<long>>.Factory.StartNew(() => ExampleTaskForced(2));     Task.WaitAll(task1, task2);     foreach (var processedByBothThreads in task1.Result.Intersect(task2.Result))     {         Console.WriteLine("Both threads processed id: {0}", processedByBothThreads);     }     Console.WriteLine("done"); }  static void SetupTestDatabase() {     RunSql<int>(new SqlCommand(DropFifoQueueTable), cmd => cmd.ExecuteNonQuery());     RunSql<int>(new SqlCommand(CreateFifoQueueTable), cmd => cmd.ExecuteNonQuery());     var generateData = new SqlCommand(GenerateDummyData);     var param = generateData.Parameters.Add("@NoRowsToGenerate",SqlDbType.BigInt);     param.Value = NoOfRecordsToTest;     RunSql<int>(generateData, cmd => cmd.ExecuteNonQuery()); }  static IList<long> ExampleTaskForced(int threadId) => new List<long>(ExampleTask(threadId)); //needed to ensure prevent lazy loadling from causing issues with our tests static IEnumerable<long> ExampleTask(int threadId) {     long? x;     while ((x = ProcessNextInQueue(threadId)).HasValue)     {         yield return x.Value;     }     //yield return 55; //optionally return a fake result just to prove that were there a duplicate we'd catch it }  static long? ProcessNextInQueue(int threadId) {     var id = RunSql<long?>(new SqlCommand(GetNextFromQueue), cmd => (long?)cmd.ExecuteScalar());     //Debug.WriteLine("Thread {0} is processing id {1}", threadId, id?.ToString() ?? "[null]"); //if you want to see how we're doing uncomment this line (commented out to improve performance / increase the likelihood of a collision      /* then if we wanted to do the second bit we could include this     if(id.HasValue) {         var markProcessed = new SqlCommand(MarkRecordProcessed);         var param = markProcessed.Parameters.Add("@Id",SqlDbType.BigInt);         param.Value = id.Value;         RunSql<int>(markProcessed, cmd => cmd.ExecuteNonQuery());     }     */      return id; }  static T RunSql<T>(SqlCommand command, Func<SqlCommand,T> callback) {     try     {         using (var connection = new SqlConnection(ConnectionString))         {                 command.Connection = connection;             command.Connection.Open();             return (T)callback(command);         }     }     catch (Exception e)     {         Debug.WriteLine(e.ToString());         throw;     } }

Other comments

The above discussion's really only talking about multiple threads taking the next record from the queue whilst avoiding any single record being picked up by multiple threads. There are a few other points...

Race Condition outside of SQL

Per our discussion, if FIFO is mandatory there are other things to worry about. i.e. whilst your threads will pick up each record in order after that it's up to them. e.g. Thread 1 gets record 10 then Thread 2 gets record 11. Now Thread 2 sends record 11 to the printer before Thread 1 sends record 10. If they're going to the same printer, your prints would be out of order. If they're different printers, not a problem; all prints on any printer are sequential. I'm going to assume the latter.

Exception Handling

If any exception occurs in a thread that's processing something (i.e. so the thread's record is printing) then consideration should be made on how to handle this. One option is to keep that thread retrying; though that may be indefinite if it's some fundamental fault. Another is to put the record to some error status to be handled by another process / where it's accepted that this record won't appear in order. Finally, if the order of invoices in the queue is an ideal rather than a hard requirement you could have the owning thread put the status back to print so that it or another thread can pick up that record to retry (though again, if there's something fundamentally wrong with the record, this may block up the queue).

My recommendation here is the error status; as then you have more visibility on the issue / can dedicate another process to dealing with issues.

Crash Handling

Another issue is that because your update to printing is not held within a transaction, if the server crashes you leave the record with this status in the database, and when your system comes back online it's ignored. Ways to avoid that are to include a column saying which thread is processing it; so that when the system comes back up that thread can resume where it left off, or to include a date stamp so that after some period of time any records with status printing which "timeout" can be swept up / reset to Error or Print statuses as required.

WITH CTE AS (     SELECT TOP(1) [InvoiceID], [Status], [ThreadId]     FROM    INVOICES     WHERE   [Status] = 'Print'     OR     ([Status] = 'Printing' and [ThreadId] = @ThreadId) --handle previous crash     ORDER BY [PrintRequestedDate], [InvoiceID]  ) UPDATE CTE SET [Status] = 'Printing' , [ThreadId] = @ThreadId OUTPUT Inserted.[InvoiceID]

Awareness of Other Processes

We've mostly been focussed on the printing element; but other processes may also be interacting with your Invoices table. We can probably assume that aside from creating the initial Draft record and updating it to Print once ready for printing, those processes won't touch the Status field. However, the same records could be locked by completely unrelated processes. If we want to ensure FIFO we must not use the ReadPast hint, as some records may have status Print but be locked, so we'd skip over them despite them having the earlier PrintRequestedDate. However, if we want things to print as quickly as possible, with them being in order when otherwise not inconvenient, including ReadPast will allow our print process to skip the locked records and carry on, coming back to handle them once they're released.

Similarly another process may lock our record when it's at Printing status, so we can't update it to mark it as complete. Again, if we want to avoid this from causing a holdup we can make use of the ThreadId column to allow our thread to leave the record at status Printing and come back to clean it up later when it's not locked. Obviously this assumes that the ThreadId column is only used by our print process.

Have a dedicated print queue table

To avoid some of the issues with unrelated processes locking your invoices, move the Status field into its own table; so you only need to read from the invoices table; not update it.

This will also have the advantage that (if you don't care about print history) you can delete the record once done, so you'll get better performance (as you won't have to search through the entire invoices table to find those which are ready to print). That said, there's another option for this one (if you're on SQL2008 or above).

Use Filtered Index

Since the Status column will be updated several times it's not a great candidate for an index; i.e. as the record's position in the index jumps from branch to branch as the status progresses. However, since we're filtering on it it's also something that would really benefit from having an index. To get around this contradiction, one option's to use a filtered index; i.e. only index those records we're interested in for our print process; so we maintain a small index for a big benefit.

create nonclustered index ixf_Invoices_PrintStatusAndDate   on dbo.Invoices ([Status], [PrintRequestedDate]) include ([InvoiceId]) --just so we don't have to go off to the main table for this, but can get all we need form the index where [Status] in ('Print','Printing')

Use "Enums" / Reference Tables

I suspect your example uses strings to keep the demo code simple, however including this for completeness. Using strings in databases can make things hard to support. Rather than having status be a string value, instead use an ID from a related Statuses table.

create table Statuses (     ID smallint not null primary key clustered --not identity since we'll likely mirror this enum in code / may want to have defined ids     ,Name )    go insert Statuses  values  (0, 'Draft') , (1,'Print') , (2,'Printing') , (3,'Printed')  create table Invoices  (     --...     , StatusId smallint foreign key references Statuses(Id)     --... )

文章来源: SQL Threadsafe UPDATE TOP 1 for FIFO Queue

标签

update

fifo