Our application has a table called cargo_items. It can be seen as kind of a queue to later process these items. Initially there was a SINGLE job which took 3000 entries and
What you need is advisory locks.
SELECT id
FROM cargo_item
WHERE pg_try_advisory_lock(id)
LIMIT 3000
FOR UPDATE
will place an advisory lock on the rows and the other processes will not see the rows if the same pg_try_advisory_lock(id) function is used in the where. Remember to unlock the rows using pg_advisory_unlock
In the related answer you are referring to:
The objective is to lock one row at a time. This works fine with or without advisory locks, because there is no chance for a deadlock - as long as you don't try to lock more rows in the same transaction.
Your example is different in that you want to lock 3000 rows at a time. There is potential for deadlock, except if all concurrent write operations lock rows in the same consistent order. Per documentation:
The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order.
Implement that with an ORDER BY in your subquery.
UPDATE cargo_item item
SET job_id = 'SOME_UUID', job_ts = now()
FROM (
SELECT id
FROM cargo_item
WHERE state='NEW' AND job_id is null
ORDER BY id
LIMIT 3000
FOR UPDATE
) sub
WHERE item.id = sub.id;
This is safe and reliable, as long as all transactions acquire locks in the same order and concurrent updates of the ordering columns are not to be expected. (Read the yellow "CAUTION" box at the end of this chapter in the manual.) So this should be safe in your case, since you are not going to update the id column.
Effectively only one client at a time can manipulate rows this way. Concurrent transactions would try to lock the same (locked) rows and wait for the first transaction to finish.
Advisory locks are useful if you have many or very long running concurrent transactions (doesn't seem you do). With only a few, it will be cheaper overall to just use above query and have concurrent transactions wait for their turn.
It seems concurrent access isn't a problem per se in your setup. Concurrency is an issue created by your current solution.
Instead, do it all in a single UPDATE. Assign batches of n numbers (3000 in the example) to each UUID and update all at once. Should be fastest.
UPDATE cargo_item c
SET job_id = u.uuid_col
, job_ts = now()
FROM (
SELECT row_number() OVER () AS rn, uuid_col
FROM uuid_tbl WHERE <some_criteria> -- or see below
) u
JOIN (
SELECT (row_number() OVER () / 3000) + 1 AS rn, item.id
FROM cargo_item
WHERE state = 'NEW' AND job_id IS NULL
FOR UPDATE -- just to be sure
) c2 USING (rn)
WHERE c2.item_id = c.item_id;
Integer division truncates. You get 1 for the first 3000 rows, 2 for the next 3000 rows. etc.
I pick rows arbitrarily, you could apply ORDER BY in the window for row_number() to assign certain rows.
If you don't have a table of UUIDs to dispatch (uuid_tbl), use a VALUES expression to supply them. Example.
You get batches of 3000 rows. The last batch will be short of 3000 if you don't find a multiple of 3000 to assign.
You will have deadlocks with this approach. You could avoid them by simply using order by id in subquery.
But it will prevent any concurrent running of this queries, as concurrent queries will always try first to mark the lowest free id, and block until the first client will commit. I don't think this is a problem if you process say less than one batch per second.
You don't need advisory locks. Avoid them if you can.