Disclaimer: theoretical question.
Several questions here was asked about how to differentiate inserted and updated rows in the PostgreSQL upsert statement.
Here is a simple example:
nd@postgres=# create table t(i int primary key, x int);
nd@postgres=# insert into t values(1,1);
nd@postgres=# insert into t values(1,11),(2,22)
on conflict(i) do update set x = excluded.i*11
returning *, xmin, xmax;
╔═══╤════╤══════╤══════╗
║ i │ x │ xmin │ xmax ║
╠═══╪════╪══════╪══════╣
║ 1 │ 11 │ 7696 │ 7696 ║
║ 2 │ 22 │ 7696 │ 0 ║
╚═══╧════╧══════╧══════╝
So, xmax > 0 (or xmax = xmin) - row was updated; xmax = 0 - row was inserted.
IMO It is not too clear explained the meanings of the xmin and xmax columns here.
Is it possible to base the logic on those columns? Is there any more significant explanation about system columns (except the source code)?
And finally is my guess right about updated/inserted rows?
I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.
In short: Your guess is right, and you can use the following RETURNING clause to determine if the row was inserted and not updated:
RETURNING (xmax = 0) AS inserted
Now the detailed explanation:
When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.
xmax serves two different purposes:
As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple (“tuple” is another word for “row”). Only transactions with a transaction ID between
xminandxmaxcan see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less thanxmax.xmaxis also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.
If only one transaction has a lock on the row,xmaxwill contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row,xmaxcontains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.
The documentation of xmax is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask of the tuple, which is not immediately visible via SQL.
You can install the contrib module pageinspect to view this and other fields of a tuple.
I ran your example, and this is what I see when I use the heap_page_items function to examine details (the transaction ID numbers are of course different in my case):
SELECT *, ctid, xmin, xmax FROM t;
┌───┬────┬───────┬────────┬────────┐
│ i │ x │ ctid │ xmin │ xmax │
├───┼────┼───────┼────────┼────────┤
│ 1 │ 11 │ (0,2) │ 102508 │ 102508 │
│ 2 │ 22 │ (0,3) │ 102508 │ 0 │
└───┴────┴───────┴────────┴────────┘
(2 rows)
SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2
FROM heap_page_items(get_raw_page('laurenz.t', 0));
┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐
│ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │
├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤
│ 1 │ 8160 │ 102507 │ 102508 │ (0,2) │ 500 │ 4002 │
│ 2 │ 8128 │ 102508 │ 102508 │ (0,2) │ 2190 │ 8002 │
│ 3 │ 8096 │ 102508 │ 0 │ (0,3) │ 900 │ 2 │
└────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘
(3 rows)
The meanings of t_infomask and t_infomask2 can be found in src/include/access/htup_details.h. lp_off is the offset of the tuple data in the page, and t_ctid is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.
Let me discuss the three rows returned by heap_page_items.
At line pointer (
lp) 1 we find the old, updated tuple. It originally hadctid = (0,1), but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued theINSERT ... ON CONFLICT). This tuple is not visible any more an will get removed duringVACUUM.t_infomaskshows that bothxminandxmaxbelong to committed transactions and consequently show when the tuples was created and deleted.t_infomask2shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (seesrc/backend/access/heap/README.HOT).At line pointer 2 we see the new, updated tuple that was created by transaction the
INSERT ... ON CONFLICT(transaction 102508).t_infomaskshows that this tuple is the result of an update,xminis valid, andxmaxcontains aKEY SHARErow lock (which is no longer relevant since the transaction has completed). This row lock was taken duringINSERT ... ON CONFLICTprocessing.t_infomask2shows that this is a HOT tuple.At line pointer 3 we see the newly inserted row.
t_infomaskshows thatxminis valid andxmaxis invalid.xmaxis set to 0 because this value is always used for newly inserted tuples.
So the nonzero xmax of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT is reimplemented one day so that this behaviour changes, but I think that is unlikely.
来源:https://stackoverflow.com/questions/39058213/postgresql-upsert-differentiate-inserted-and-updated-rows-using-system-columns-x