I\'m working on a design for a concurrency-safe incremental aggregate rollup system,and track_commit_timestamp (pg_xact_commit_timestamp) sounds perfect. But I\'ve found ver
Laurenz, first off, you're a champion for digging in and helping me out. Thank you. For background, I've asked this question in more detail on a few of the PG mailing lists, and got zero responses. I think it was because my complete question was too long.
I tried to be shorter here and, sadly, have not explained the important part clearly. Physical optimization is not the driving concern. In fact, the commit_timestamp system will cost me space as it's a global setting for all tables. My real tables will have full timestamptz (set to UTC) fields that I'll index and aggregate against. What I'm trying to sort out now (design phase) is the accuracy of the approach. Namely, am I capturing all events once and only once?
What I'm need is a reliable sequential number or time line to mark the highest/latest row I processed and the current highest/latest row. This lets me grab any rows that have not been processed without re-selecting already handled rows, or blocking the table as it adds new rows. This idea is called a "concurrency ID" in some contexts. Here's a sketch adapted from another part of our project where it made sense to use numbers instead of timestamps (but timelines are a type of number line):
D'oh! I can't post images. It's here:
https://imgur.com/iD9bn5Q
It shows a number line for tracking records that are in three portions [Done][Capture these][Tailing]
"Done" is everything from the highest/latest counter processed.
"Capture these" is everything later than "Done" and less than the current max counter in the table.
"Tailing" is any new, higher counters added by other inputs while the "capture these" rows are being processed.
It's easier to see in a picture.
So, I've got a small utility table such as this:
CREATE TABLE "rollup_status" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We use UUIDs, not necessary here, but it's what we use.
"rollup_name" text NOT NULL DEFAULT false,
"last_processed_dts" timestamptz NOT NULL DEFAULT NULL); -- Marks the last timestamp processed.
And now imagine one entry:
rollup_name last_processed_dts
error_name_counts 2018-09-26 02:23:00
So, my number line (timeline, in the case of the commit timestamps) is processed from whatever the 0 date is through 2018-09-26 02:23:00. The next time through, I get the current max from the table I'm interested in, 'scan':
select max(pg_xact_commit_timestamp(xmin)) from scan; -- Pretend that it's 2019-07-07 25:00:00.0000000+10
This value becomes the upper bound of my search, and the new value of rollup_status.last_processed_dts.
-- Find the changed row(s):
select *
from scan
where pg_xact_commit_timestamp(xmin) > '2019-07-07 20:46:14.694288+10' and
pg_xact_commit_timestamp(xmin) <= '2019-07-07 25:00:00.0000000+10
That's the "capture these" segment of my number line. This is also the only use I've got planned for the commit timestamp data. We're pushing data in from various sources, and want their timestamps (adjusted to UTC), not a server timestamp. (Server timestamps can make sense, they just don't happen to in the case of our data.) So, the sole purpose of the commit timestamp is to create a reliable number line.
If you look at the chart, it shows three different number lines for the same base table. The table itself only has one number or timeline, there are three different uses of that number/time series. So, three rollup_status rows, going with my sketch table from earlier. The "scan" table needs to know nothing about how it is used. This is a huge benefit of this strategy. You can add, remove, and redo operations without having to alter the master table or its rows at all.
I'm also considering an ON AFTER INSERT/UPDATE selection trigger with a transition table for populating a timestamptz (set to UTC), like row_commmitted_dts. That might be my plan B, but it requires adding the triggers and it seems like it could only be a bit less accurate than the actual transaction commit time. Probably a small difference, but with concurrency stuff, little problems can blow up into big bugs in a hurry.
So, the question is if I can count on the commit timestamp system to produce accurate results that won't appear "in the past." That's why I can't use transaction IDs. They're assigned at the start of the transaction, but can be committed in any order. (As I understand it.) Therefore, my range boundaries of "last processed" and "current maximum in file" can't work. I could get that range and a pending transaction could commit with thousands of records with a timestamp earlier than my previously recorded "max value." That's why I'm after commit stamps.
Again, thanks for any help or suggestions. I'm very grateful.
P.S The only discussion I've run into in the Postgres world with something like this is here:
Scalable incremental data aggregation on Postgres and Citus https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/
They're using bigserial counters in this way but, as far as I understand it, that only works for INSERT, not UPDATE. And, honestly, I don't know enough about Postgres transactions and serials to think through the concurrency behavior.