Optimization of count query for PostgreSQL

前端 未结 3 821
南笙
南笙 2021-01-05 08:44

I have a table in postgresql that contains an array which is updated constantly.

In my application i need to get the number of rows for which a specific parameter is

3条回答
  •  忘掉有多难
    2021-01-05 09:36

    Is there an existing pattern I’m not aware of that applies to this situation? what would be the best approach for this?

    Your best bet in this situation might be to normalize your schema. Split the array out into a table. Add a b-tree index on the table of properties, or order the primary key so it's efficiently searchable by property_id.

    CREATE TABLE demo( id integer primary key );
    INSERT INTO demo (id) SELECT id FROM arrtable;
    CREATE TABLE properties (
      demo_id integer not null references demo(id),
      property integer not null,
      primary key (demo_id, property)
    );
    CREATE INDEX properties_property_idx ON properties(property);
    

    You can then query the properties:

    SELECT count(id) 
    FROM demo 
    WHERE NOT EXISTS (
      SELECT 1 FROM properties WHERE demo.id = properties.demo_id AND property = 1
    )
    

    I expected this to be a lot faster than the original query, but it's actually much the same with the same sample data; it runs in the same 2s to 3s range as your original query. It's the same issue where searching for what is not there is much slower than searching for what is there; if we're looking for rows containing a property we can avoid the seqscan of demo and just scan properties for matching IDs directly.

    Again, a seq scan on the array-containing table does the job just as well.

提交回复
热议问题