Function taking forever to run for large number of records

…衆ロ難τιáo~ 提交于 2019-12-02 09:21:59
Erwin Brandstetter

Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:

T1            T2            T3            ...
SELECT max(id) -- id 1
              SELECT max(id)  -- id 1
                            SELECT max(id)  -- id 1
                                          ...
              Row id 1 locked, wait ...
                            Row id 1 locked, wait ...
UPDATE id 1
                                          ... 

COMMIT
              Wake up, UPDATE id 1 again!
              COMMIT
                            Wake up, UPDATE id 1 again!
                            COMMIT
                                          ... 

Largely rewritten and simplified as SQL function:

CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
  RETURNS text AS 
$func$
   UPDATE table t
   SET    id_used = 'Y'
        , col1 = val1
        , id_used_date = now() 
   FROM  (
      SELECT id
      FROM   table 
      WHERE  id_used IS NULL
      AND    id_type = val2
      ORDER  BY id
      LIMIT  1
      FOR    UPDATE   -- lock to avoid race condition! see below ...
      ) t1
   WHERE  t.id_type = val2
   -- AND    t.id_used IS NULL -- repeat condition (not if row is locked)
   AND    t.id = t1.id
   RETURNING  id;
$func$  LANGUAGE sql;

Related question with a lot more explanation:

Explain

  • Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.

  • You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.

  • You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:

The locking clauses cannot be used in contexts where returned rows cannot be clearly identified with individual table rows; for example they cannot be used with aggregation.

  • Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.

  • If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:

    CREATE INDEX foo_idx ON table (id_type, id)
    WHERE id_used IS NULL;
    

Alternative solutions

Advisory locks May be the superior approach here:

Or you may want to lock many rows at once:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!