Postgres doesn't use index for slow function

问题

In my database design, a lot of functions are used. And many of them are very slow. So, I decided that it could be a wise idea, to create indexes on some of them in order to make execution a little bit faster. However, I don't succeed in persuading PostgreSQL (9.6) to actually use my index.

Consider this table "user"

id integer | name jsonb
1          | {"last_names": ["Tester"], "first_names": ["Teddy","Eddy"]} 
2          | {"last_names": ["Miller"], "first_names": ["Lisa","Emma"]}

Often, I need the name as one string, that's done with a query like (called "concat_name")

SELECT array_to_string(jsonb_arr2text_arr(name->'last_names'), ' ') || ', ' || array_to_string(jsonb_arr2text_arr(name->'first_names'), ' ');

I decided to put that functionality into a function, because it is used on multiple tables:

CREATE OR REPLACE FUNCTION public.concat_name(name jsonb)
  RETURNS text AS
$BODY$
  SELECT pg_sleep(50);
  SELECT array_to_string(jsonb_arr2text_arr(name->'last_names'), ' ') || ', ' || array_to_string(jsonb_arr2text_arr(name->'first_names'), ' ');
$BODY$
  LANGUAGE sql IMMUTABLE SECURITY DEFINER
  COST 100;

You see, to actually test whether it works, I've added an "artificially" time out. Now, I've created an index like:

CREATE INDEX user_concat_name_idx ON "user" (concat_name(name));

which succeeds and takes the expected time (because of the pg_sleep). I then run a query:

SELECT concat_name(name) FROM "user";

However, the index is not being used and the query is very slow. Instead, EXPLAIN tells me that the planer does a Sequence Scan on "user".

I did a little bit of research and many people state that the query planer thinks that in case the table is small or the dataset being retrieved is (almost) the whole table it thinks that doing a sequence scan is more efficient than looking up an index. However, in case of functions, especially slow ones, that doesn't make any sense to me. Even if you query a table which contains only one row - using a function index could dramatically decrease the execution time if your query includes a function which takes 50 seconds to execute each time.

So, in my opinion, the query planner has to compare the time it takes the look up the indexed value vs. the time it takes to execute the function. The size of the table or of the query itself (how many rows are returned), doesn't matter at all here. And, well, if the function takes 50 seconds to execute, looking up the index should always win.

So, what can I do here to make the query planer use the index instead of executing the function each time anew?

回答1:

First, the index on (id, concat_name(name)) makes no sense if you want to use it in a query where you select only concat_name(name). The index should be:

create index user_concat_name_idx on "user" (concat_name(name));

Second, the index will be used when it is needed, e.g. when you add order by concat_name(name):

explain analyse
select concat_name(name)
from "user"
order by 1;

                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using user_concat_name_idx on "user"  (cost=0.42..29928.42 rows=100000 width=82) (actual time=0.157..1046.168 rows=100000 loops=1)
 Planning time: 0.753 ms
 Execution time: 1048.862 ms
(3 rows)

Additionally, you can make your function simpler and faster:

create or replace function concat_name(name jsonb)
returns text language sql immutable as $$
    select concat_ws(', ',
        (select string_agg(value, ' ')
        from jsonb_array_elements_text(name->'last_names')),
        (select string_agg(value, ' ')
        from jsonb_array_elements_text(name->'first_names'))
    )
$$;

What can I do here to make the query planer use the index instead of executing the function each time anew?

You should declare a larger cost of the function, e.g.:

create or replace function concat_name(name jsonb)
returns text language sql immutable as $$
-- ...
$$
cost 1000;

Per the documentation:

execution_cost

A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

回答2:

Normally when you use any function on a varchar/text column eg like, upper, lower in a clause postgres dont take your normal index into consideration and does a full scan on all the rows. You need an index which is built for this purpose. For example

create index ix_tblname_col_upper on tblname (UPPER(col) varchar_pattern_ops);

Similarly you can also use text_pattern_ops on text columns.

来源：https://stackoverflow.com/questions/44023031/postgres-doesnt-use-index-for-slow-function

标签

sql

postgresql