问题

Question Summary

This is a question about serializability of queries within a SQL transaction.

Specifically, I am using PostgreSQL. It may be assumed that I am using the most current version of PostgreSQL. From what I have read, I believe the technology used to support what I am trying to do is known as "MultiVersion Concurrency Control", or "MVCC".

To sum it up: If I have one primary table, and more-than-1 foreign-key-linked table connected to that primary table, how do I guarantee that, for a given key in the tables, and any number of SELECT statements using that key inside one transaction, each of which is SELECTing from any of the linked tables, I will get data as it existed at the time I started the transaction?

Example

Let's say I have 3 tables:

bricks
    brickworks (primary key)
    completion_time (primary key)
    has_been_sold

brick_colors
    brickworks (primary key, foreign key pointing to "bricks")
    completion_time (primary key, foreign key pointing to "bricks")
    quadrant (primary key)
    color

brick_weight
    brickworks (primary key, foreign key pointing to "bricks")
    completion_time (primary key, foreign key pointing to "bricks")
    weight

A brickworks produces one brick at a time. It makes bricks that may be of different colors in each of its 4 quadrants.

Someone later analyzes the bricks to determine their color combination, and writes the results to the brick_colors table.

Someone else analyzes the bricks to determine their weight, and writes the results to the brick_weight table.

At any given time, an existing brick may or may not have a recorded color, and may or may not have a recorded weight.

An application exists, and this application receives word that someone wants to buy a particular brick (already known at this point to the application by its brickworks/completion_time composite key).

The application wants to select all known properties of the brick AT THE EXACT TIME IT STARTS THE QUERY.

If color or weight information is added MID-TRANSACTION, the application does NOT want to know about it.

The application wants to perform SEPARATE QUERIES (not a SELECT with multiple JOINs to the foreign-key-linked tables, which might return multiple rows because of the brick_colors table).

This example is deliberately simple; the desire to do this without one SELECT with multiple JOINs would be clearer if my example included, say, 10 foreign-key-linked tables, and many or all of them could return multiple rows for the same primary key (like brick_colors does in the example as I have it above).

Attempted Solution

Here's what I've come up with so far:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY ;

-- All this statement accomplishes is telling the database what rows should be returned from the present point-in-time in future queries within the transaction
SELECT DISTINCT true
FROM bricks b
LEFT JOIN brick_colors bc ON bc.brickworks = b.brickworks AND bc.completion_time = b.completion_time
LEFT JOIN brick_weight bw ON bw.brickworks = b.brickworks AND bw.completion_time = b.completion_time
WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;

SELECT * FROM brick_colors WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_weight WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;

COMMIT ;

It just seems wasteful to use that first SELECT with the JOINs solely for purposes of ensuring serializability.

Is there any other way to do this?

References

PostgreSQL Concurrency Control

PostgreSQL Transcation Isolation

PostgreSQL SET TRANSACTION statement

回答1:

This is the essence of your question:

how do I guarantee that, for ...... any number of SELECT statements ..... inside one transaction ....... I will get data as it existed at the time I started the transaction?

This is exactly what Repeatable Read Isolation Level guarantees:

The Repeatable Read isolation level only sees data committed before the transaction began; it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. (However, the query does see the effects of previous updates executed within its own transaction, even though they are not yet committed.) This is a stronger guarantee than is required by the SQL standard for this isolation level, and prevents all of the phenomena described in Table 13-1. As mentioned above, this is specifically allowed by the standard, which only describes the minimum protections each isolation level must provide.

This level is different from Read Committed in that a query in a repeatable read transaction sees a snapshot as of the start of the transaction, not as of the start of the current query within the transaction. Thus, successive SELECT commands within a single transaction see the same data, i.e., they do not see changes made by other transactions that committed after their own transaction started.

A practical example - let say we have 2 simple tables:

CREATE TABLE t1( x int );
INSERT INTO t1 VALUES (1),(2),(3);
CREATE TABLE t2( y int );
INSERT INTO t2 VALUES (1),(2),(3);

A number of tables, their structures, primary keys, foreign keys etc. are unimportant here.

Lets open a first session, start repeatable read isolation level, and run two simple and separate SELECT statements:

test=# START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION
test=# SELECT * FROM t1;
 x
---
 1
 2
 3
(3 wiersze)


test=# SELECT * FROM t2;
 y
---
 1
 2
 3
(3 wiersze)

Note that START TRANSACTION command automatically disables autocommit mode in the session.

Now in another session (with default autocommit mode enabled)insert a few records into t1:

test2=# INSERT INTO t1 VALUES(10),(11);

New values were inserded and automatically commited (because autocommit is on).

Now go back to the first session and run SELECT again: test=# select * from t1;

 x
---
 1
 2
 3
(3 wiersze)

As you see, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.

Lets do the same experiment whit table t2 - go to the second session and issue:

test2=# DELETE FROM t2 WHERE y = 2;
DELETE 1

Now go back to the first session and run SELECT again:

test=# SELECT * FROM t2;
 y
---
 1
 2
 3
(3 wiersze)

As you see, again, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.

And now, in session1, finish the transaction issuing COMMIT, and then SELECT:

test=# SELECT * FROM t1;
 x
---
 1
 2
 3
(3 wiersze)

test=# SELECT * FROM t2;
 y
---
 1
 2
 3
(3 wiersze)

test=# COMMIT;
COMMIT

test=# select * from t1;
 x
----
  1
  2
  3
 10
 11
(5 wierszy)


test=# select * from t2;
 y
---
 1
 3
(2 wiersze)

As you see, when the repeatable read transaction is started and active, you can run many separate select statement multiple times, and all of these select statements see the same stable snapshot of data as of the start of the transaction, regardles of any commited data in other sessions.

来源：https://stackoverflow.com/questions/42319573/transaction-isolation-across-multiple-tables-using-postgresql-mvcc

标签

postgresql

mvcc

Transaction Isolation Across Multiple Tables using PostgreSQL MVCC

问题

Question Summary

Other Questions

Example

Attempted Solution

References

回答1: