Postgresql: query on jsonb column - index doesn't make it quicker

问题

There is a table in Postgresql 9.6, query on jsonb column is slow compared to a relational table, and adding a GIN index on it doesn't make it quicker.

Table:

-- create table
create table dummy_jsonb (
    id serial8,
    data jsonb,
    primary key (id)
);

-- create index
CREATE INDEX dummy_jsonb_data_index ON dummy_jsonb USING gin (data);
-- CREATE INDEX dummy_jsonb_data_index ON dummy_jsonb USING gin (data jsonb_path_ops);

Generate data:

-- generate data,
CREATE OR REPLACE FUNCTION dummy_jsonb_gen_data(n integer) RETURNS integer AS $$
DECLARE
    i integer:=1;
    name varchar;
    create_at varchar;
    json_str varchar;
BEGIN
    WHILE i<=n LOOP
        name:='dummy_' || i::text;
        create_at:=EXTRACT(EPOCH FROM date_trunc('milliseconds', now())) * 1000;
        json_str:='{
                 "name": "' || name || '",
                 "size": ' || i || ',
                 "create_at": ' || create_at || '
               }';

        insert into dummy_jsonb(data) values
        (json_str::jsonb
        );
        i:= i + 1;
    END LOOP;

    return n;
END;
$$ LANGUAGE plpgsql;

-- call function,
select dummy_jsonb_gen_data(1000000);

-- drop function,
DROP FUNCTION IF EXISTS dummy_jsonb_gen_data(integer);

Query:

select * from dummy_jsonb
where data->>'name' like 'dummy_%' and data->>'size' >= '500000'
order by data->>'size' desc
offset 50000 limit 10;

Test result:

The query takes 1.8 seconds on a slow vm.
Adding or removing the index, don't make a difference.
Changing to index gin with jsonb_path_ops, also don't make a difference.

Questions:

Is it possible to make the query quicker, either improve index or sql?
If not, the does it means, within pg a relational table is more proper in this case?
And, in my test, mongodb performs better, does that means mongodb is more proper for such storage & query?

回答1:

Quote from the manual

The default GIN operator class for jsonb supports queries with top-level key-exists operators ?, ?& and ?| operators and path/value-exists operator @> [...] The non-default GIN operator class jsonb_path_ops supports indexing the @> operator only.

Your query uses LIKE and string comparison with > (which is probably not correct to begin with), neither of those are supported by a GIN index.

But even an index on (data ->> 'name') wouldn't be used for the condition data->>'name' like 'dummy_%' as that is true for all rows because every name starts with dummy.

You can create a regular btree index on the name:

CREATE INDEX ON dummy_jsonb ( (data ->> 'name') varchar_pattern_ops);

Which will be used if the condition is restrictive enough, e.g.:

where data->>'name' like 'dummy_9549%'

If you need to query for the size, you can create an index on ((data ->> 'size')::int) and then use something like this:

where (data->>'size')::int >= 500000

However your use of limit and offset will always force the database to read all rows, sort them and the limit the result. This is never going to be very fast. You might want to read this article for more information why limit/offset is not very efficient.

JSON is a nice addition to the relational world, but only if you use it appropriately. If you don't need dynamic attributes for a row, then use standard columns and data types. Even though JSON support is Postgres is extremely good, this doesn't mean one should use it for everything, just because it's the current hype. Postgres is still a relational database and should be used as such.

Unrelated, but: your function to generate the test data can be simplified to a single SQL statement. You might not have been aware of the generate_series() function for things like that:

insert into dummy_jsonb(data)
select jsonb_build_object('name', 'dummy_'||i, 
                          'size', i::text, 
                          'created_at', (EXTRACT(EPOCH FROM date_trunc('milliseconds', clock_timestamp())) * 1000)::text)
from generate_series(1,1000000) as t(i);

回答2:

While a btree index (the standard PostgreSQL index based on binary trees) is able to optimize ordering-based queries like >= '500000', the gin index, using an inverted index structure, is meant to quickly find data containing specific elements (it is quite used e.g. for text search to find rows containing given words), so (AFAIK) it can't be used for the query you provide.

PostgreSQL docs on jsonb indexing indicates on which WHERE conditions the index may be applied. As pointed out there, you can create a btree index on specific elements in a jsonb column: indexes on the specific elements referenced in the WHERE clause should work for the query you indicate.

Also, as commented above, think whether you actually need JSON for your use case.

来源：https://stackoverflow.com/questions/48092484/postgresql-query-on-jsonb-column-index-doesnt-make-it-quicker

标签

postgresql

indexing

jsonb