What to index on queries with lots of columns in the WHERE clause

后端 未结 2 1191
我寻月下人不归
我寻月下人不归 2020-12-31 16:52

Building a search engine for an apartment site and I\'m not sure how to index the apartments table.

Example of queries:

  • ...WHERE c
相关标签:
2条回答
  • 2020-12-31 17:27

    You have to figure out what WHERE clauses you are going to use with this query, how often each will occur and and how selective each condition will be.

    • Don't index for queries that occur seldom unless you have to.

    • Use multicolumn indexes, starting with those columns that will occur in an = comparison.

    • Concerning the order of columns in a multicolumn index, start with those columns that will be used in a query by themselves (an index can be used for a query with only some of its columns, provided they are at the beginning of the index).

    • You might omit columns with low selectivity, like gender.

    For example, with your above queries, if they are all frequent and all columns are selective, these indexes would be good:

    ... ON apartments (city_id, rooms, size)
    
    ... ON apartments (area_id, ad_type, price)
    
    ... ON apartments (area_id, ad_type, published_at)
    

    These indexes could also be used for WHERE clauses with only area_id or city_id in them.

    It is bad to have too many indexes.

    If the above method would lead to too many indexes, e.g. because the user can pick arbitrary columns for the WHERE clause, it is better to index individual columns or occasionally pairs of columns that regularly go together.

    That way PostgreSQL can pick a bitmap index scan to combine several indexes for one query. That is less efficient than a regular index scan, but usually better than a sequential scan.

    0 讨论(0)
  • 2020-12-31 17:45

    Postgres 9.6 provides a new extension to address your conundrum precisely:

    bloom index

    From the same authors who brought trigram indexes or text search to Postgres (among other things).

    A single bloom index on all involved columns works well for any combination of them in the WHERE clause - even if not as well as a separate btree indexes on each column. But a single index is much smaller and cheaper to maintain than many indexes. You'll have to weigh costs and benefits.

    A bloom index excels for many index columns that can be combined in many ways.

    I might combine a bloom index as "catch-all" with some tailored multicolumn btree indexes to optimize the most common combinations (along the guidelines provided by @Laurenz) and some single column indexes on the most frequently queried columns.

    Some more explanation:

    • Is a composite index also good for queries on the first field?

    The feature is new and there are some important limitations. Quoting the manual:

    • Only operator classes for int4 and text are included with the module.

    • Only the = operator is supported for search. But it is possible to add support for arrays with union and intersection operations in the future.

    So not for published_at, which looks like a date (but you could still extract an EPOCH and index that) and only for equality predicates.

    After creating the extension (once per DB):

    CREATE EXTENSION bloom;
    

    Create a bloom index:

    CREATE INDEX tbl_bloomidx
    ON tbl USING bloom (area_id, city_id, size, rooms, ad_type);  -- many more columns?
    

    And some others:

    CREATE INDEX tbl_published_at ON tbl (published_at);
    CREATE INDEX tbl_published_at ON tbl (price);
    -- some popular combinations...
    

    The manual has some examples comparing bloom, multicolumn and single-column btree indexes. Very insightful.

    0 讨论(0)
提交回复
热议问题