Changing ORDER BY from id to another indexed column (with low LIMIT) has a huge cost

落爺英雄遲暮 提交于 2019-12-06 02:38:37

It turned out to be an index issue. The NULLS behaviour of the query was not coherent with the index.

CREATE INDEX message_created_at_idx on message (created_at DESC NULLS LAST);

... ORDER BY message.created_at DESC; -- defaults to NULLS FIRST when DESC

solutions

If you specify NULLS in your index or query, make sure they are coherent with each other.

ie: ASC NULLS LAST is coherent with ASC NULLS LAST or DESC NULLS FIRST.

NULLS LAST

CREATE INDEX message_created_at_idx on message (created_at DESC NULLS LAST);

... ORDER BY messsage.created_at DESC NULLS LAST;

NULLS FIRST

CREATE INDEX message_created_at_idx on message (created_at DESC); -- defaults to NULLS FIRST when DESC

... ORDER BY messsage.created_at DESC -- defaults to NULLS FIRST when DESC;

NOT NULL

If your column is NOT NULL, don't bother with NULLS.

CREATE INDEX message_created_at_idx on message (created_at DESC);

... ORDER BY messsage.created_at DESC;
Erwin Brandstetter

Fix your query

Your WHERE condition is on a table that's joined via LEFT JOIN nodes. The WHERE condition forces the joins to behave like [INNER] JOIN. That's pointless and may confuse the query planner, especially with a query that has a lot of tables and therefore many possible query plans. By setting that right, you reduce the number of possible query plans drastically, making it easier for Postgres to find a good one.
More details in the answer to the additionally spawned question.

SELECT m0_.id AS id0, ...
FROM   site            s3_
JOIN   listing         l2_ ON l2_.site_id = s3_.id
JOIN   conversation    c1_ ON c1_.listing_id = l2_.id
JOIN   message         m0_ ON m0_.conversation_id = c1_.id

LEFT   JOIN user_      u4_ ON u4_.id = l2_.poster_id
LEFT   JOIN user_      u5_ ON u5_.id = m0_.author_user_id
LEFT   JOIN guest_data g6_ ON g6_.id = m0_.author_guest_id
WHERE  s3_.id = '287'  -- ??
ORDER  BY m0_.created_at DESC
LIMIT  25

Why s3_.id = '287'?

Looks like 287 should be an integer type, that you would typically enter as numeric constant without quotes: 287. What's the actual data type (and why)? Only a minor problem either way.

Reading the query plan

@FuzzyTree already hinted (quite accurately) that sorting on a different column than what's used in your WHERE clause complicates things. But that's not the elephant in the room here.

The combination with LIMIT 25 makes the difference huge. Both query plans show a reduction from rows=124616 to rows=25 in their last step, which is huge.

Both query plans also show: Seq Scan on site s3_ ... rows=1. So if you ORDER BY _s3.id in your fast variant, you are not actually ordering anything. While the other query has to find the top 25 rows out of 124616 candidates ... Hardly a fair comparison.

Solution

After clarification, the problem seems clearer. You are selecting a huge number of rows by one criteria, but ordering by another. No conventional index design can cover this, not even if both columns were to reside in the same table (which they don't).

I think we found a (non-trivial) solution for this class of problems under this related question on dba.SE:

Of course, all the usual advice for query optimization and general performance optimization applies.

In your first query your WHERE and ORDER BY are both on id, so it can take advantage of the same index whereas your second query has different columns for your WHERE and ORDER BY.

Try adding a composite index so the same index can be used for your WHERE and ORDER BY

CREATE INDEX myIndex ON message (id,created_at);
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!