apparent transaction isolation violation in postgresql

问题

I am using PostgreSQL 9.3.12 on CentOS Linux.

I have two processes connecting to the same database, using a default transaction isolation level of "read committed". According to the postgres docs, one process in a transaction should not "see" changes made by another process in a transaction until they are committed.

A sequence I am seeing is:

process A starts its transaction
process A deletes everything from table T
process B starts its transaction
process B attempts a select for update on one row in table T
process B comes up empty (0 rows) and calls rollback
process A repopulates table T from incoming data
process A commits its transaction

Now, table T should have been populated before both transactions began, and process B's query should have turned up one row. And it does if these processes do not run concurrently.

My understanding is that process B should see the old copy of the desired row in table T, makes its changes, and those changes should be clobbered by process A's deletion and repopulation of table T. I can't figure out why process B is coming up empty.

Beyond a complete misunderstanding by myself of these preconditions, can anyone think of another reason why I would see this behaviour?

Worry not about the lousy architecture, it is going away. I'm just trying to understand why this situation seems to violate the "read committed" transaction isolation as I understand it.

Thanks.

回答1:

According to the postgres docs, one process in a transaction should not "see" changes made by another process in a transaction until they are committed.

Yes and No - as usual, it depends. The documentation strictly says that:

Read Committed is the default isolation level in PostgreSQL.

When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of SELECT FOR UPDATE and SELECT FOR SHARE, this means it is the updated version of the row that is locked and returned to the client.

In other word, simply SELECT differs from SELECT FOR UPDATE/DELETE/UPDATE.

You can create simple test case to observe that behaviour:

Session 1

test=> START TRANSACTION;
START TRANSACTION
test=> SELECT * FROM test;
 x
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)


test=> DELETE FROM test;
DELETE 10
test=>

Now login in another Session 2:

test=> START TRANSACTION;
START TRANSACTION
test=> SELECT * FROM test;
 x
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)


test=> SELECT * FROM test WHERE x = 5 FOR UPDATE;

After the last command SELECT ... FOR UPDATE session 1 "hangs" and is waiting for something ......

Back in session 1

test=> insert into test select * from generate_series(1,10);
INSERT 0 10
test=> commit;
COMMIT

And now when you go back to session 2 you will see this:

test=> SELECT * FROM test WHERE x = 5 FOR UPDATE;
 x
---
(0 rows)


test=> select * from test;
 x
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)

That is - simple SELECT still doesn't see any changes, while SELECT ... FOR UPDATE does see that rows have been deleted. But it doesn't see new rows inserted by session 1

In fact a sequence you are seeing is:

process A starts its transaction
process A deletes everything from table T
process B starts its transaction
process B attempts a select for update on one row in table T
process B "hangs" and is waiting until session A does a commit or rollback
process A repopulates table T from incoming data
process A commits its transaction
process B comes up empty (0 rows- after session A commit) and calls rollback

来源：https://stackoverflow.com/questions/45923021/apparent-transaction-isolation-violation-in-postgresql

标签

postgresql

transactions