Postgres deadlocks on concurrent upserts

倾然丶 夕夏残阳落幕 提交于 2019-12-06 05:19:28

问题


We have an application which reads from a data stream and upserts that information into a database. The data is changes which occur on Google Drive which means that many events which impact the same objects can occur very close to each other.

We're running into deadlocks when upserting this information into the database, here is what comes out in the log. I have reconstructed and sanitised the query for readability:

ERROR:  deadlock detected
DETAIL:  Process 10586 waits for ShareLock on transaction 166892743; blocked by process 10597.
  Process 10597 waits for ShareLock on transaction 166892741; blocked by process 10586.
  Process 10586: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

  Process 10597: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.167Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.167Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

HINT:  See server log for query details.
CONTEXT:  while locking tuple (3908269,11) in relation "documents"
STATEMENT:  
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

The schema:

    Column      |            Type             |                             Modifiers
----------------+-----------------------------+-------------------------------------------------------------------
uid             | uuid                        | not null default gen_random_uuid()
date_created    | timestamp without time zone | not null default now()
sequence_id     | bigint                      | not null default nextval('documents__sequence_id__seq'::regclass)
version         | integer                     | not null default 0
source          | text                        | not null
source_id       | text                        | not null
ingestion_date  | timestamp without time zone | not null

Indexes:
    "documents__pkey" PRIMARY KEY, btree (uid)
    "documents__sequence_id__unique" UNIQUE CONSTRAINT, btree (sequence_id)
    "documents__source__source_id__deleted" UNIQUE, btree (source, source_id)
    "documents__ingestion_date__idx" btree (ingestion_date)
    "documents__source_id__source__idx" btree (source_id, source)

I suspect the problem is something like "the first transaction was locking rows with source_id alpha, beta, gamma in sequence, meanwhile the second transaction was locking rows with source_id delta, gamma in the reverse order, and the deadlock happened at the point they both locked gamma and delta", however the timing here is very tight!

What would the solution for this be? Sorting our values list by source_id?


回答1:


I can think of three solutions:

  1. You insert only one row per statement, but that's inefficient.

  2. You sort the rows before inserting them.

  3. You retry a transaction if it gets a deadlock or serialization error.

I'd prefer the third solution unless the errors happen very often.




回答2:


Your query's syntax allows ordering the values easily:

INSERT INTO documents
          (version, source, source_id, ingestion_date)
   SELECT * FROM (
      VALUES
          (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
          (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')
      ) AS v ORDER BY source, source_id

      ON CONFLICT (source, source_id)

This should solve your problem. Performance should be nice, as the sort will be tiny.



来源:https://stackoverflow.com/questions/46366324/postgres-deadlocks-on-concurrent-upserts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!