问题
I have table
create table big_table (
id serial primary key,
-- other columns here
vote int
);
This table is very big, approximately 70 million rows, I need to query:
SELECT * FROM big_table
ORDER BY vote [ASC|DESC], id [ASC|DESC]
OFFSET x LIMIT n -- I need this for pagination
As you may know, when x
is a large number, queries like this are very slow.
For performance optimization I added indexes:
create index vote_order_asc on big_table (vote asc, id asc);
and
create index vote_order_desc on big_table (vote desc, id desc);
EXPLAIN
shows that the above SELECT
query uses these indexes, but it\'s very slow anyway with a large offset.
What can I do to optimize queries with OFFSET
in big tables? Maybe PostgreSQL 9.5 or even newer versions have some features? I\'ve searched but didn\'t find anything.
回答1:
A large OFFSET
is always going to be slow. Postgres has to order all rows and count the visible ones up to your offset. To skip all previous rows directly you could add an indexed row_number
to the table (or create a MATERIALIZED VIEW including said row_number
) and work with WHERE row_number > x
instead of OFFSET x
.
However, this approach is only sensible for read-only (or mostly) data. Implementing the same for table data that can change concurrently is more challenging. You need to start by defining desired behavior exactly.
I suggest a different approach for pagination:
SELECT *
FROM big_table
WHERE (vote, id) > (vote_x, id_x) -- ROW values
ORDER BY vote, id -- needs to be deterministic
LIMIT n;
Where vote_x
and id_x
are from the last row of the previous page (for both DESC
and ASC
). Or from the first if navigating backwards.
Comparing row values is supported by the index you already have - a feature that complies with the ISO SQL standard, but not every RDBMS supports it.
CREATE INDEX vote_order_asc ON big_table (vote, id);
Or for descending order:
SELECT *
FROM big_table
WHERE (vote, id) < (vote_x, id_x) -- ROW values
ORDER BY vote DESC, id DESC
LIMIT n;
Can use the same index.
I suggest you declare your columns NOT NULL
or acquaint yourself with the NULLS FIRST|LAST
construct:
- PostgreSQL sort by datetime asc, null first?
Note two things in particular:
The
ROW
values in theWHERE
clause cannot be replaced with separated member fields.WHERE (vote, id) > (vote_x, id_x)
cannot be replaced with:WHERE vote >= vote_x AND id > id_xThat would rule out all rows with
id <= id_x
, while we only want to do that for the same vote and not for the next. The correct translation would be:WHERE (vote = vote_x AND id > id_x) OR vote > vote_x
... which doesn't play along with indexes as nicely, and gets increasingly complicated for more columns.
Would be simple for a single column, obviously. That's the special case I mentioned at the outset.
The technique does not work for mixed directions in
ORDER BY
like:ORDER BY vote ASC, id DESC
At least I can't think of a generic way to implement this as efficiently. If at least one of both columns is a numeric type, you could use a functional index with an inverted value on
(vote, (id * -1))
- and use the same expression inORDER BY
:ORDER BY vote ASC, (id * -1) ASC
Related:
- SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
- Improve performance for order by with columns from many tables
Note in particular the presentation by Markus Winand I linked to:
- "Pagination done the PostgreSQL way"
回答2:
Have you tried partioning the table ?
Ease of management, improved scalability and availability, and a reduction in blocking are common reasons to partition tables. Improving query performance is not a reason to employ partitioning, though it can be a beneficial side-effect in some cases. In terms of performance, it is important to ensure that your implementation plan includes a review of query performance. Confirm that your indexes continue to appropriately support your queries after the table is partitioned, and verify that queries using the clustered and nonclustered indexes benefit from partition elimination where applicable.
http://sqlperformance.com/2013/09/sql-indexes/partitioning-benefits
来源:https://stackoverflow.com/questions/34110504/optimize-query-with-offset-on-large-table