Kaminari is slow with COUNT(*) on a huge table in Postgres [duplicate]

时光怂恿深爱的人放手 提交于 2019-12-24 00:49:06

问题


I'm using the Kaminari gem to paginate a query on a large table (~1.5MM rows). While fetching the actual results pages is quite quick (~20ms), kaminari's added SELECT COUNT(*) WHERE .... is excruciatingly slow, and adds several extra seconds to the execution time.

Is there a way to approximate the number of results instead?


回答1:


Quick estimate for whole table

For a very quick estimate for the whole table:

Your example hints at addresses. Say we have a table called adr in the schema public:

SELECT reltuples FROM pg_class WHERE oid = 'public.adr'::regclass;

More details in this related answer:
How do I speed up counting rows in a PostgreSQL table?

Count with condition(s)

For a count with a condition, Postgres can use indexes to make it faster. This has been improved with "covering indexes" in Postgres 9.2, but certain requirements have to be met to profit from that. More in the Postgres Wiki about Index-only scans.

For queries with conditions on city and state, this multicolumn index would help a lot, if the conditions are selective (only a small percentage of the rows meet the condition):

CREATE INDEX adr_foo_idx ON adr (city, state);

If you have a small set of typical conditions, you might even use partial indexes:

CREATE INDEX adr_ny_ny_idx ON adr(adr_id)
WHERE  city = 'New York'
AND    state = 'NY';

... one for every set of (state, city)

Or a combination of both:

CREATE INDEX adr_ny_idx ON adr (city)
WHERE  state = 'NY';

... one per state

Normalize

Of course, everything to make your big table (and indexes) smaller helps. Lookup tables for cities and cities would go a long way to cut down on redundant storage. The key word here is normalization.

Instead of:

CREATE TABLE adr (
  adr_id serial PRIMARY KEY
 ,state text
 ,city text
 ...
 );

SELECT count(*)
FROM   adr
WHERE  city = 'New York'
AND    state = 'NY';

Normalize your database design and use proper indexes:

CREATE TABLE state (
  state_id serial PRIMARY KEY
 ,state text UNIQUE
 );

CREATE TABLE city (
  city_id serial PRIMARY KEY
 ,state_id int REFERENCES state
 ,city text
 ,UNIQUE (state_id, city)
 );

CREATE TABLE adr (
  adr_id serial PRIMARY KEY
  city_id int REFERENCES city
  ...
 );

CREATE INDEX adr_city_idx ON adr (city_id);

SELECT count(*)
FROM   state s
JOIN   city  c USING (state_id)
JOIN   adr   a USING (city_id)
WHERE  s.state = 'NY'
AND    c.city  = 'New York'

Table and index become smaller. Integer handling is faster than text. Everything becomes faster.

Materialized view

On top of that, if performance is crucial, and since you do not need exact counts, you could use a materialized view with counts for relevant conditions. Refresh the view at events or times of your choosing to keep numbers up to date. Follow the link to the manual for details. Requires Postgres 9.3, but you can easily implement it manually in any version.



来源:https://stackoverflow.com/questions/21839330/kaminari-is-slow-with-count-on-a-huge-table-in-postgres

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!