How to improve performance of a function with cursors in PostgreSQL?

I have function with two nested cursors. The outer cursor gets payment details of a customer from the source and inserts into the target based on some business logic. The inner cursor takes the payment details of each payment, it happens one after another.

The payments table has about 125000 rows, and about 335000 rows for payment details. All of these rows are to be migrated to a target table. Executing the function takes over two hours and the database CPU usage goes up to 99%.
I am working with PostgreSQL 9.2.

How can I improve the performance of the function?

The code I am using:

CREATE OR REPLACE FUNCTION ccdb_dummy.o_payments1(a integer)
  RETURNS void AS
$BODY$
DECLARE
cursor_1  refcursor;
cursor_2  refcursor;
rowcount integer;
rowcount1 integer;
payment_no bigint;
query_1 character varying(250);
lc_cin_num bigint;
test character varying(50);
t_payments ccdb_stg.o_payments_stg%ROWTYPE;
t_payments_details ccdb_stg.o_payment_head_dtls_stg%ROWTYPE;

BEGIN
rowcount := 0;

open cursor_1 for select * from ccdb_stg.o_payments_stg WHERE section_code = a;

select count(1) into rowcount from ccdb_stg.o_payments_stg WHERE section_code = a;

for i IN 1..rowcount loop

fetch cursor_1 into t_payments;

payment_no= nextval('ccdb_stg.payments_seq');

select cin into lc_cin_num from ccdb_dummy.consumers a where a.consumer_num =           t_payments.consumer_num;

insert into  ccdb_dummy.payments(payment_id,receipt_id,source_system_flag,cin,consumer_nbr,cust_connection_id,cust_type_flg,receipt_type_id,mop_code,mop_details,coll_effect_date,coll_entry_date,receipt_num,receipt_amt,receipt_loc_flg,receipt_date,cancel_flag,acc_type_id,cust_section_code,coll_section_code,remarks,pm_paydate,pm_amount,ref_transaction_id,creation_dt,created_by)    values(payment_no,t_payments.receipt_id,t_payments.origin_flag,lc_cin_num,t_payments.consumer_num,t_payments.cust_connection_id,t_payments.cust_type_flag,t_payments.receipt_type_id,t_payments.mop_id,t_payments.mop_details,t_payments.coll_effect_date,t_payments.coll_entry_date,t_payments.receipt_num,t_payments.receipt_amt,t_payments.receipt_flag,t_payments.receipt_date,t_payments.cancel_flag,t_payments.acc_type_flag,t_payments.cust_section_code,t_payments.coll_section_code,t_payments.remarks,t_payments.pm_paydate,t_payments.pm_amount,null,now(),'system');

select count(1) into rowcount1 from ccdb_stg.o_payment_head_dtls_stg a where a.mbc_receipt_id = t_payments.receipt_id;

open cursor_2 for select * from ccdb_stg.o_payment_head_dtls_stg a where a.mbc_receipt_id = t_payments.receipt_id;

for i IN 1..rowcount1 loop

fetch cursor_2 into t_payments_details;

insert into ccdb_dummy.payment_head_dtls(payment_id,mbc_receipt_id,charge_head_code,amount,tariff_id,creation_dt,created_by)
values (payment_no,t_payments_details.mbc_receipt_id,t_payments_details.charge_head_code,t_payments_details.amount,t_payments_details.tariff_id,now(),'system');

end loop;

close cursor_2;

end loop;

close cursor_1;

END;
$BODY$
  LANGUAGE plpgsql;

Erwin Brandstetter

In a first step, I radically simplified your procedural code:

CREATE OR REPLACE FUNCTION ccdb_dummy.o_payments1(a integer)
  RETURNS void AS
$func$
DECLARE
   t   record;
   t1  record;
BEGIN

FOR t IN
   SELECT *
         ,nextval('ccdb_stg.payments_seq') AS payment_no
         ,c.cin
   FROM   ccdb_stg.o_payments_stg   p
   LEFT   JOIN ccdb_dummy.consumers c USING (consumer_num)
   WHERE  p.section_code = $1
LOOP

   INSERT INTO ccdb_dummy.payments(payment_id,receipt_id,source_system_flag,cin, ... ,pm_amount,ref_transaction_id,creation_dt,created_by)
   VALUES(t.payment_no,t.receipt_id,t.origin_flag,t.cin, ... ,t.pm_amount,null,now(),'system');

   FOR t1 IN
      SELECT *
      FROM   ccdb_stg.o_payment_head_dtls_stg h
      WHERE  h.mbc_receipt_id = t.receipt_id
   LOOP
      INSERT INTO ccdb_dummy.payment_head_dtls(payment_id,mbc_receipt_id,charge_head_code,amount,tariff_id,creation_dt,created_by)
      VALUES (t.payment_no,t1.mbc_receipt_id,t1.charge_head_code,t1.amount,t1.tariff_id,now(),'system');
   END LOOP;
END LOOP;

END
$func$  LANGUAGE plpgsql;

Use the implicit cursor of a FOR LOOP instead of unwieldy explicit cursors coupled with redundant counts and loops. Much simpler and faster. Read the chapter "Looping Through Query Results" in the manual.
LEFT JOIN to ccdb_dummy.consumers in the first SELECT instead of running a separate select for every row.
Also include nextval('ccdb_stg.payments_seq') AS payment_no in the first SELECT. cheaper than lots of separate queries.
Minor detail: assignment operator in plpgsql is :=, not =. Details here.

But that's far from perfect, still. Consider a completely new approach with set-based operations instead of individual inserts in loops. Much cleaner and faster, yet. That's how modern RDBMS operate best.

One SQL statement with a data-modifying CTE

Wrapped into an SQL function to be a drop-in replacement.
Data-modifying CTEs require Postgres 9.1 or later.

CREATE OR REPLACE FUNCTION ccdb_dummy.o_payments2(integer)
  RETURNS void AS
$func$

WITH ins1 AS (
   INSERT INTO ccdb_dummy.payments(
          payment_id,                        cin,  receipt_id,   ...  ,   pm_amount, ref_transaction_id,creation_dt,created_by)   
   SELECT nextval('ccdb_stg.payments_seq'),c.cin,p.receipt_id,   ...  , p.pm_amount, null,              now(),      'system'
   FROM   ccdb_stg.o_payments_stg   p
   LEFT   JOIN ccdb_dummy.consumers c USING (consumer_num)
   WHERE  p.section_code = $1
   RETURNING payment_id, receipt_id
   )
INSERT INTO ccdb_dummy.payment_head_dtls(
         payment_id,  mbc_receipt_id,  charge_head_code,  amount,  tariff_id,creation_dt,created_by)
SELECT i.payment_id,h.mbc_receipt_id,h.charge_head_code,h.amount,h.tariff_id,now(),      'system'
FROM   ins1 i
JOIN   ccdb_stg.o_payment_head_dtls_stg h ON h.mbc_receipt_id = i.receipt_id;

$func$  LANGUAGE sql;

Should do the same as the above plpgsql function exactly (barring errors in translation). Just much simpler and faster.

Find more examples for INSERTs using data-modifying CTEs here on SO.

来源：https://stackoverflow.com/questions/21892983/how-to-improve-performance-of-a-function-with-cursors-in-postgresql

标签

function

postgresql

cursor

plpgsql

database-performance