问题
I am using spring-data-jpa
& postgresql-9.4
.
There is a table: tbl_oplog
. This table has about seven million rows of data, and data is need to be displayed on the front end.(paged).
I use Spring#PagingAndSortingRepository
, and then I found that the data query was very slow. From the logs, I found that two SQL queries were issued:
select
oplog0_.id as id1_8_,
oplog0_.deleted as deleted2_8_,
oplog0_.result_desc as result_d3_8_,
oplog0_.extra as extra4_8_,
oplog0_.info as info5_8_,
oplog0_.login_ipaddr as login_ip6_8_,
oplog0_.level as level7_8_,
oplog0_.op_type as op_type8_8_,
oplog0_.user_name as user_nam9_8_,
oplog0_.op_obj as op_obj10_8_,
oplog0_.op as op11_8_,
oplog0_.result as result12_8_,
oplog0_.op_time as op_time13_8_,
oplog0_.login_name as login_n14_8_
from
tbl_oplog oplog0_
where
oplog0_.deleted=false
order by
oplog0_.op_time desc limit 10
And:
select
count(oplog0_.id) as col_0_0_
from
tbl_oplog oplog0_
where
oplog0_.deleted=?
(The second SQL statement is used to populate the page object,which is necessary)
I found the second statement to be very time-consuming. Why does it take so long?
How to optimize? Does this happen with Mysql
?
Or is there any other way I can optimize this requirement? (It seems that select count is inevitable).
EDIT: I'll use another table for the demonstration(same): Table:
select count(*) from tbl_gather_log; // count is 6300931.cost 5.408S
EXPLAIN select count(*) from tbl_gather_log:
Aggregate (cost=246566.58..246566.59 rows=1 width=0)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0)
EXPLAIN ANALYSE select count(*) from tbl_gather_log:
Aggregate (cost=246566.58..246566.59 rows=1 width=0) (actual time=6697.102..6697.102 rows=1 loops=1)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0) (actual time=0.173..4622.674 rows=6300936 loops=1)
Heap Fetches: 298
Planning time: 0.312 ms
Execution time: 6697.267 ms
EDIT2:
TABLE:
create table tbl_gather_log (
id bigserial not null primary key,
event_level int,
event_time timestamp,
event_type int,
event_dis_type int,
event_childtype int,
event_name varchar(64),
dev_name varchar(32),
dev_ip varchar(32),
sys_type varchar(16),
event_content jsonb,
extra jsonb
);
And:
There are probably many filtering criteria supported, so i can't simply do special operations on deleted.For example, a query might be issued select * from tbl_oplog where name like xxx and type = xxx limit 10
,so, there will be a query:select count * from tbl_oplog where name like xxx and type = xxx
. Futhermore, i have to know exact counts. because I need to show how many pages there are on the front end.
回答1:
The second statement takes a long time because it has to scan the whole table in order to count the rows.
One thing you can do is use an index:
CREATE INDEX ON tbl_oplog (deleted) INCLUDE (id);
VACUUM tbl_oplog; -- so you get an index only scan
Assuming that id
is the primary key, it would be much better to use count(*)
and omit the INCLUDE
clause from the index.
But the best is probably to use an estimate:
SELECT t.reltuples * freq.f AS estimated_rows
FROM pg_stats AS s
JOIN pg_namespace AS n
ON s.schemaname = n.nspname
JOIN pg_class AS t
ON s.tablename = t.relname
AND n.oid = t.relnamespace
CROSS JOIN LATERAL
unnest(s.most_common_vals::text::boolean[]) WITH ORDINALITY AS val(v,id)
JOIN LATERAL
unnest(s.most_common_freqs) WITH ORDINALITY AS freq(f,id)
USING (id)
WHERE s.tablename = 'tbl_oplog'
AND s.attname = 'deleted'
AND val.v = ?;
This uses the distribution statistics to estimate the desired count.
If it is just about pagination, you don't need exact counts.
Read my blog for more on the topic of counting in PostgreSQL.
来源:https://stackoverflow.com/questions/59319786/postgresl-select-count-time-consuming