In PostgreSQL, do multiple UPDATES to different rows in the same table having a locking conflict?

限于喜欢 提交于 2019-12-20 04:24:34

问题


I'm a bit wondering about an update i'm making one a big table, do i need to worry about locks.

I have a table looking like this:

CREATE TABLE "ItemsToProcess"( 
"id" text, 
"WorkerInstanceId" text, 
"ProcessingStartTime" timestamp with time zone, 
"UpdatedTime" timestamp with time zone, 
CONSTRAINT "ITP_PK" PRIMARY KEY ("id")
)WITH (
  OIDS=FALSE
);

Initially, this table has ~2.0 million rows in it, and only ID filled in, WorkerInstanceId and the two timestamps are null by default and on the start of the run.

What happens is that some worker apps (at least two, but will be around 10-13 in production) will mark a batch of ID-s (i plan to set batchSize to 200) from this table from them to process. What happens during processing doesn't really matter now. The marking of a batch looks like this:

UPDATE "ItemsToProcess" 
   SET "WorkerInstanceId" = ?, "ProcessingStartTime" = current_timestamp()
 WHERE "WorkerInstanceId" is NULL
 LIMIT 200;

My question is, do i need to worry about locking the rows i'm going to update before making the update?

Postgres documentation says:

ROW EXCLUSIVE

Conflicts with the SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.

The commands UPDATE, DELETE, and INSERT acquire this lock mode on the target table (in addition to ACCESS SHARE locks on any other referenced tables). In general, this lock mode will be acquired by any command that modifies the data in a table.

So i think that whenever one of the workers makes this update, the whole table is locked, 200 rows are updated and the lock is freed up in the end. Until the lock is in place, the other workers are waiting the lock to free up. Am i right or i miss something?

Thanks for the help!


回答1:


You are missing a couple of things.

First, PostgreSQL does not offer a "LIMIT" option for update. See the docs for UPDATE.

Second, note that "ROW EXCLUSIVE" does not conflict with itself, it conflicts with "SHARE ROW EXCLUSIVE" which is different. So, your UPDATE statements can safely run concurrently from multiple workers. You still will want to your UPDATE times to be low. However, you already have a built-in way to tune that by lowering your 'batchSize' if you run into problems.




回答2:


UPDATE locks the row, so you do not need to lock it first. If you try to UPDATE overlapping sets of rows simultaneously, the second UPDATE will wait for the first's transaction to commit or roll back.

The big problem with your approach - other than the fact that UPDATE doesn't have a LIMIT clause - is that multiple workers will all try to grab the same rows. Here's what happens:

worker1: Filters the table to find 200 rows and locks them worker1: starts updating rows worker2: filters the table to find 200 rows worker2: tries to start updating rows, but has selected the same rows as worker1 so it blocks on worker1's lock worker1: Finishes updating rows worker2: After lock release, re-checks the WHERE condition and finds out that none of the rows match anymore because worker1 has updated them. Updates zero rows. ... and repeat!

You need to either:

  • Have a central queue handing out rows in a proper concurrency-safe way; or
  • Assign workers non-overlapping ranges of IDs to work on

As for LIMIT - you could use WHERE id IN (SELECT t.id FROM thetable t LIMIT 200 ORDER BY id) - but you'd have the same problem with both workers choosing the same set of rows to update.



来源:https://stackoverflow.com/questions/11761281/in-postgresql-do-multiple-updates-to-different-rows-in-the-same-table-having-a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!