I have a table of invoices being prepared, and then ready for printing.
[STATUS]
column is Draft, Print, Printing, Printed
I need to get the ID of the first (FIFO) record to be printed, and change the record status. The operation must be threadsafe so that another process does not select the same InvoiceID
Can I do this (looks atomic to me, but maybe not ...):
1:
WITH CTE AS ( SELECT TOP(1) [InvoiceID], [Status] FROM INVOICES WHERE [Status] = 'Print' ORDER BY [PrintRequestedDate], [InvoiceID] ) UPDATE CTE SET [Status] = 'Printing' , @InvoiceID = [InvoiceID]
... perform operations using @InvoiceID
...
UPDATE INVOICES SET [Status] = 'Printed' WHERE [InvoiceID] = @InvoiceID
or must I use this (for the first statement)
2:
UPDATE INVOICES SET [Status] = 'Printing' , @InvoiceID = [InvoiceID] WHERE [InvoiceID] = ( SELECT TOP 1 [InvoiceID] FROM INVOICES WITH (UPDLOCK) WHERE [Status] = 'Print' ORDER BY [PrintRequestedDate], [InvoiceID] )
... perform operations using @InvoiceID
... etc.
(I cannot hold a transaction open from changing status to "Printing" until the end of the process, i.e. when status is finally changed to "Printed").
EDIT:
In case it matters the DB is READ_COMMITTED_SNAPSHOT
I can hold a transaction for both UPDATE STATUS to "Printing" AND get the ID. But I cannot continue to keep transaction open all the way through to changing the status to "Printed". This is an SSRS report, and it makes several different queries to SQL to get various bits of the invoice, and it might crash/whatever, leaving the transaction open.
@Gordon Linoff "If you want a queue" The FIFO sequence is not critical, I would just like invoices that are requested first to be printed first ... "more or less" (don't want any unnecessary complexity ...)
@Martin Smith "looks like a usual table as queue requirement" - yes, exactly that, thanks for the very useful link.
SOLUTION:
The solution I am adopting is from comments:
@lad2025 pointed me to SQL Server Process Queue Race Condition which uses WITH (ROWLOCK, READPAST, UPDLOCK)
and @MartinSmith explained what the Isolation issue is and pointed me at Using tables as Queues - which talks about exactly what I am trying to do.
I have not grasped why UPDATE TOP 1
is safe, and UPDATE MyTable SET xxx = yyy WHERE MyColumn = (SELECT TOP 1 SomeColumn FROM SomeTable ORDER BY AnotherColumn)
(without Isolation Hints) is not, and I ought to educate myself, but I'm happy just to put the isolation hints in my code and get on with something else :)
Thanks for all the help.
My concern would be duplicate [InvoiceID]
Multiple print requests for the same [InvoiceID]
On the first update ONE row gets set [Status] = 'Printing'
On the second update all [InvoiceID] rows get set [Status] = 'Printed'
This would even set rows with status = 'draft'
Maybe that is what you want
Another process could pick up the same [InvoiceID] before the set [Status] = 'Print'
So some duplicates will print and some will not
I go with comments on use the update lock
This is non-deterministic but you could just take top (1)
and skip the order by
. You will tend to get the most recent row but it is not guaranteed. If you clear the queue then you get em all.
This demonstrates you can lose 'draft' = 1
declare @invID int; declare @T table (iden int identity primary key, invID int, status tinyint); insert into @T values (1, 2), (5, 1), (3, 1), (4, 1), (4, 2), (2, 1), (1, 1), (5, 2), (5, 2); declare @iden int; select * from @t order by iden; declare @rowcount int = 1; while (@ROWCOUNT > 0) begin update top (1) t set t.status = 3, @invID = t.invID, @iden = t.iden from @t t where t.status = '2'; set @rowcount = @@ROWCOUNT; if(@rowcount > 0) begin select @invID, @iden; -- do stuff update t set t.status = 4 from @t t where t.invID = @invID; -- t.iden = @iden; select * from @T order by iden; end end
ATOMicity of a Single Statement
I think your code's fine as it is. i.e. Because you have a single statement which updates the status to printing
as soon as the statement runs the status is updated; so anything running before which searches for print
would have updated the same record to printing
before your process saw it; so your process would pick a subsequent record, or any process hitting it after your statement had run would see it as printing
so would not pick it up`. There isn't really a scenario where a record could pick it up whilst the statement's running since as discussed a single SQL statement should be atomic.
Disclaimer
That said, I'm not enough of an expert to say whether explicit lock hints would help; in my opinion they're not needed as the above is atomic, but others in the comments are likely better informed than me. However, running a test (albeit where database and both threads are run on the same machine) I couldn't create a race condition... maybe if the clients were on different machines / if there was a lot more concurrency you'd be more likely to see issues.
My hope is that others have interpreted your question differently, hence the disagreement.
Attempt to Disprove Myself
Here's the code I used to try to cause a race condition; you can drop it into LINQPad 5
, select language C# Program
, tweak the connectionstring (and optionally any statements) as required, then run:
const long NoOfRecordsToTest = 1000000; const string ConnectionString = "Server=.;Database=Play;Trusted_Connection=True;"; //assumes a database called "play" const string DropFifoQueueTable = @" if object_id('FIFOQueue') is not null drop table FIFOQueue"; const string CreateFifoQueueTable = @" create table FIFOQueue ( Id bigint not null identity (1,1) primary key clustered , Processed bit default (0) --0=queued, null=processing, 1=processed )"; const string GenerateDummyData = @" with cte as ( select 1 x union all select x + 1 from cte where x < @NoRowsToGenerate ) insert FIFOQueue(processed) select 0 from cte option (maxrecursion 0) "; const string GetNextFromQueue = @" with singleRecord as ( select top (1) Id, Processed from FIFOQueue --with(updlock, rowlock, readpast) --optionally include this per comment discussions where processed = 0 order by Id ) update singleRecord set processed = null output inserted.Id"; //we don't really need this last bit for our demo; I've included in case the discussion turns to this.. const string MarkRecordProcessed = @" update FIFOQueue set Processed = 1 where Id = @Id"; void Main() { SetupTestDatabase(); var task1 = Task<IList<long>>.Factory.StartNew(() => ExampleTaskForced(1)); var task2 = Task<IList<long>>.Factory.StartNew(() => ExampleTaskForced(2)); Task.WaitAll(task1, task2); foreach (var processedByBothThreads in task1.Result.Intersect(task2.Result)) { Console.WriteLine("Both threads processed id: {0}", processedByBothThreads); } Console.WriteLine("done"); } static void SetupTestDatabase() { RunSql<int>(new SqlCommand(DropFifoQueueTable), cmd => cmd.ExecuteNonQuery()); RunSql<int>(new SqlCommand(CreateFifoQueueTable), cmd => cmd.ExecuteNonQuery()); var generateData = new SqlCommand(GenerateDummyData); var param = generateData.Parameters.Add("@NoRowsToGenerate",SqlDbType.BigInt); param.Value = NoOfRecordsToTest; RunSql<int>(generateData, cmd => cmd.ExecuteNonQuery()); } static IList<long> ExampleTaskForced(int threadId) => new List<long>(ExampleTask(threadId)); //needed to ensure prevent lazy loadling from causing issues with our tests static IEnumerable<long> ExampleTask(int threadId) { long? x; while ((x = ProcessNextInQueue(threadId)).HasValue) { yield return x.Value; } //yield return 55; //optionally return a fake result just to prove that were there a duplicate we'd catch it } static long? ProcessNextInQueue(int threadId) { var id = RunSql<long?>(new SqlCommand(GetNextFromQueue), cmd => (long?)cmd.ExecuteScalar()); //Debug.WriteLine("Thread {0} is processing id {1}", threadId, id?.ToString() ?? "[null]"); //if you want to see how we're doing uncomment this line (commented out to improve performance / increase the likelihood of a collision /* then if we wanted to do the second bit we could include this if(id.HasValue) { var markProcessed = new SqlCommand(MarkRecordProcessed); var param = markProcessed.Parameters.Add("@Id",SqlDbType.BigInt); param.Value = id.Value; RunSql<int>(markProcessed, cmd => cmd.ExecuteNonQuery()); } */ return id; } static T RunSql<T>(SqlCommand command, Func<SqlCommand,T> callback) { try { using (var connection = new SqlConnection(ConnectionString)) { command.Connection = connection; command.Connection.Open(); return (T)callback(command); } } catch (Exception e) { Debug.WriteLine(e.ToString()); throw; } }
Other comments
The above discussion's really only talking about multiple threads taking the next record from the queue whilst avoiding any single record being picked up by multiple threads. There are a few other points...
Race Condition outside of SQL
Per our discussion, if FIFO is mandatory there are other things to worry about. i.e. whilst your threads will pick up each record in order after that it's up to them. e.g. Thread 1
gets record 10
then Thread 2
gets record 11
. Now Thread 2
sends record 11
to the printer before Thread 1
sends record 10
. If they're going to the same printer, your prints would be out of order. If they're different printers, not a problem; all prints on any printer are sequential. I'm going to assume the latter.
Exception Handling
If any exception occurs in a thread that's processing something (i.e. so the thread's record is printing
) then consideration should be made on how to handle this. One option is to keep that thread retrying; though that may be indefinite if it's some fundamental fault. Another is to put the record to some error
status to be handled by another process / where it's accepted that this record won't appear in order. Finally, if the order of invoices in the queue is an ideal rather than a hard requirement you could have the owning thread put the status back to print
so that it or another thread can pick up that record to retry (though again, if there's something fundamentally wrong with the record, this may block up the queue).
My recommendation here is the error
status; as then you have more visibility on the issue / can dedicate another process to dealing with issues.
Crash Handling
Another issue is that because your update to printing
is not held within a transaction, if the server crashes you leave the record with this status in the database, and when your system comes back online it's ignored. Ways to avoid that are to include a column saying which thread is processing it; so that when the system comes back up that thread can resume where it left off, or to include a date stamp so that after some period of time any records with status printing
which "timeout" can be swept up / reset to Error
or Print
statuses as required.
WITH CTE AS ( SELECT TOP(1) [InvoiceID], [Status], [ThreadId] FROM INVOICES WHERE [Status] = 'Print' OR ([Status] = 'Printing' and [ThreadId] = @ThreadId) --handle previous crash ORDER BY [PrintRequestedDate], [InvoiceID] ) UPDATE CTE SET [Status] = 'Printing' , [ThreadId] = @ThreadId OUTPUT Inserted.[InvoiceID]
Awareness of Other Processes
We've mostly been focussed on the printing element; but other processes may also be interacting with your Invoices
table. We can probably assume that aside from creating the initial Draft
record and updating it to Print
once ready for printing, those processes won't touch the Status
field. However, the same records could be locked by completely unrelated processes. If we want to ensure FIFO we must not use the ReadPast
hint, as some records may have status Print
but be locked, so we'd skip over them despite them having the earlier PrintRequestedDate
. However, if we want things to print as quickly as possible, with them being in order when otherwise not inconvenient, including ReadPast
will allow our print process to skip the locked records and carry on, coming back to handle them once they're released.
Similarly another process may lock our record when it's at Printing
status, so we can't update it to mark it as complete. Again, if we want to avoid this from causing a holdup we can make use of the ThreadId
column to allow our thread to leave the record at status Printing
and come back to clean it up later when it's not locked. Obviously this assumes that the ThreadId
column is only used by our print process.
Have a dedicated print queue table
To avoid some of the issues with unrelated processes locking your invoices, move the Status
field into its own table; so you only need to read from the invoices
table; not update it.
This will also have the advantage that (if you don't care about print history) you can delete the record once done, so you'll get better performance (as you won't have to search through the entire invoices table to find those which are ready to print). That said, there's another option for this one (if you're on SQL2008 or above).
Use Filtered Index
Since the Status column will be updated several times it's not a great candidate for an index; i.e. as the record's position in the index jumps from branch to branch as the status progresses. However, since we're filtering on it it's also something that would really benefit from having an index. To get around this contradiction, one option's to use a filtered index; i.e. only index those records we're interested in for our print process; so we maintain a small index for a big benefit.
create nonclustered index ixf_Invoices_PrintStatusAndDate on dbo.Invoices ([Status], [PrintRequestedDate]) include ([InvoiceId]) --just so we don't have to go off to the main table for this, but can get all we need form the index where [Status] in ('Print','Printing')
Use "Enums" / Reference Tables
I suspect your example uses strings to keep the demo code simple, however including this for completeness. Using strings in databases can make things hard to support. Rather than having status be a string value, instead use an ID from a related Statuses table.
create table Statuses ( ID smallint not null primary key clustered --not identity since we'll likely mirror this enum in code / may want to have defined ids ,Name ) go insert Statuses values (0, 'Draft') , (1,'Print') , (2,'Printing') , (3,'Printed') create table Invoices ( --... , StatusId smallint foreign key references Statuses(Id) --... )