Postgres choosing BTREE instead of BRIN index

一曲冷凌霜 提交于 2019-12-23 07:36:34

问题


I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:

select sum(transaction_amt), 
       sum (total_amt) 
from fact_transaction 
where transaction_date_key between 20170101 and 20170201 

I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:

https://explain.depesz.com/s/uPI

I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:

https://explain.depesz.com/s/5VXi

In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?

I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.

Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.

Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?


回答1:


It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.

That probably means that transaction_date_key is not well correlated with the physical location of the rows in the table.

A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.

So a lot of block ranges in your table contain transaction_date_key between 20170101 and 20170201, and all of these blocks have to be scanned to compute the query result.

I see two options to improve the situation:

  • Lower the pages_per_range storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.

  • Cluster the table on the transaction_date_key attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.



来源:https://stackoverflow.com/questions/42098803/postgres-choosing-btree-instead-of-brin-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!