Postgres stored function input-checking overhead, interpreting timing results

天涯浪子 提交于 2020-03-03 08:25:13

问题


While answering another question, Klin demonstrated an easy way of doing some loose timing tests. The question is "How expensive are exceptions?" There are mentions in the documentation and elsewhere that PL/PgSQL is slower than SQL for stored functions, and that EXCEPTION is expensive. I have no intuition about Postgres' performance in these situations, and figured I'd try out a few comparisons. Klin showed how to use the (wonderful) generate_series() function to make this easy.

And here's the necessary pre-amble:

  • I swear I'm not starting a fight about speed tests. I have less than no interest in that.

  • These are loose, artificial tests cases. I'm just trying to get a vibe for how different styles compare to each other. Basically, what's the basic overhead in stored functions for various approaches to input validation.

  • SQL and PL/PgSQL aren't interchangeable, so it's not quite fair to compare them 1:1. If you can do something in pure SQL, great. But that's not always possible.

  • These tests run each function 1,000,000 times each to amplify what are in, in absolute terms, minuscule differences in execution time.

  • The numbers are rounded to the nearest 10...and even then, misleading. With modern CPUs and contemporary OSs, getting several % of variability over "identical" runs is normal.

As important as all of that, the tests aren't directly comparable as the routines do somewhat different things. So, if you're interested in this question, you have to read the code. The tests attempt to compare a few things:

  • SQL vs PL/PgSQL for a simple operation.
  • The cost of an unused EXCEPTION block.
  • The cost of an unused IF...ELSE...END IF block.
  • The cost of an EXCEPTION block and RAISE to check an input parameter.
  • The cost of an IF...ELSE...END IF block and RAISE to check an input parameter.
  • The cost of a DOMAIN-based constraint to short-circuit calls with a bad input parameter.

Here's a summary of execution times for 1,000,000 iterations each using PG 12.1:

Language    Function                     Error     Milliseconds
SQL         test_sql                     Never             580
PL/PgSQL    test_simple                  Never            2250
PL/PgSQL    test_unused_exception_block  Never            4200
PL/PgSQL    test_if_that_never_catches   Never            2600
PL/PgSQL    test_if_that_catches         Never             310
PL/PgSQL    test_if_that_catches         Every time       2750
PL/PgSQL    test_exception_that_catches  Never            4230
PL/PgSQL    test_exception_that_catches  Every time       3950
PL/PgSQL    test_constraint              Never             310
PL/PgSQL    test_constraint              Every time       2380

Note: I varied the # of iterations on the constraint catching tests and, yes, it changes. So it doesn't appear that the loop breaks on the first error.

If you run the code yourself, you'll get different times...and the variability across multiple runs is pretty high. So, not the kinds of numbers you can use for more than a sense of things, I think.

Does anyone see anything completely off about the results here, or how I calculated them? In my particular case, all of the numbers above read as "absolutely fine, it will make zero real-world difference." You need to run these things 1000+ times to even get a millisecond of difference, give-or-take. I'm looking at error-checking for methods that are called some...not a million times in a loop. My functions are going to spend their time doing real work, like searches, the overhead of any of the approaches I tried smells inconsequential For me, the winner looks like test_if_that_catches. Namely, an IF at the start of the BEGIN that catches bad inputs and then uses RAISE to return a report. That's a good match to how I like to structure methods anyway, it's readable, and it's simple to raise custom exceptions that way.

I'll list out the functions, and then the test code.

--------------------------------------------
-- DOMAIN: text_not_empty
--------------------------------------------
DROP DOMAIN IF EXISTS text_not_empty;

CREATE DOMAIN text_not_empty AS
    text
    NOT NULL
    CHECK (value <> '');

COMMENT ON DOMAIN text_not_empty IS
    'The string must not be empty';

--------------------------------------------
-- FUNCTION test_sql()
--------------------------------------------
drop function if exists test_sql();
create or replace function test_sql()
returns int as $$

select 1;
$$
LANGUAGE sql;

--------------------------------------------
-- FUNCTION test_simple()
--------------------------------------------
drop function if exists test_simple();
create or replace function test_simple()
returns int language plpgsql as $$
begin
    return 1;
end $$;

--------------------------------------------
-- FUNCTION test_unused_exception_block()
--------------------------------------------
drop function if exists test_unused_exception_block();
create or replace function test_unused_exception_block()
returns int language plpgsql as $$
begin
    return 1;
exception when others then
    raise exception 'ugh';
-- note that any exception is never trapped
-- anyway the function is much more expensive
-- see execution time in query plans
end $$;

--------------------------------------------
-- FUNCTION test_if_that_never_catches()
--------------------------------------------
drop function if exists test_if_that_never_catches();
create or replace function test_if_that_never_catches()
returns int language plpgsql as $$
begin
if 1 > 2 then
    raise exception 'You have an unusually high value for 1';
    -- This never happens, I'm following Klin's previous example,
    -- just trying to measure the overhead of the if...then..end if.
end if;

    return 1;
end $$;

--------------------------------------------
-- FUNCTION test_if_that_catches()
--------------------------------------------
drop function if exists test_if_that_catches(text_not_empty);
create or replace function test_if_that_catches(text_not_empty)
returns int language plpgsql as $$
begin
if $1 = '' then
    raise exception 'The string must not be empty';
end if;

    return 1;
end $$;

--------------------------------------------
-- FUNCTION test_exception_that_catches()
--------------------------------------------
drop function if exists test_exception_that_catches(text);
create or replace function test_exception_that_catches(text)
returns int language plpgsql as $$
begin
    return 1;
exception when others then
    raise exception 'The string must not be empty';
end $$;

--------------------------------------------
-- FUNCTION test_constraint()
--------------------------------------------
drop function if exists test_constraint(text_not_empty);
create or replace function test_constraint(text_not_empty)
returns int language plpgsql as $$
begin
    return 1;
end $$;


--------------------------------------------
-- Tests
--------------------------------------------
-- Run individually and look at execution time

explain analyse
select sum(test_sql())
from generate_series(1, 1000000);

explain analyse
select sum(test_simple())
from generate_series(1, 1000000);

explain analyse
select sum(test_unused_exception_block())
from generate_series(1, 1000000);

explain analyse
select sum(test_if_that_never_catches())
from generate_series(1, 1000000);

explain analyse
select sum(test_if_that_catches('')) -- Error thrown on every case
from generate_series(1, 1000000);

explain analyse
select sum(test_if_that_catches('a')) -- Error thrown on no cases
from generate_series(1, 1000000);

explain analyse
select sum(test_exception_that_catches(''))-- Error thrown on every case
from generate_series(1, 1000000);

explain analyse
select sum(test_exception_that_catches('a')) -- Error thrown on no cases
from generate_series(1, 1000000);

explain analyse
select sum(test_constraint('')) -- Error thrown on no cases
from generate_series(1, 1000000);

explain analyse
select sum(test_constraint('a')) -- Error thrown on no cases
from generate_series(1, 1000000); 

回答1:


Your tests look OK to me if all that you want to compare is the speed of various methods to verify the correctness of inputs. Unsurprisingly, the methods that avoid calling the function at all place win.

I concur with you that the difference is mostly irrelevant. Checking inputs is not what will decide if your functions are efficient or not, that will get lost in the noise if the function does any real work.

Your effort is valiant, but your time might be spent better on tuning the SQL statements that the function is going to execute.



来源:https://stackoverflow.com/questions/60066659/postgres-stored-function-input-checking-overhead-interpreting-timing-results

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!