PostgreSQL query runs faster with index scan, but engine chooses hash join

后端 未结 4 1970
逝去的感伤
逝去的感伤 2020-12-08 05:35

The query:

SELECT \"replays_game\".*
FROM \"replays_game\"
INNER JOIN
 \"replays_playeringame\" ON \"replays_game\".\"id\" = \"replays_playeringame\".\"game_         


        
4条回答
  •  自闭症患者
    2020-12-08 05:56

    This is an old post, but quite helpful that I just encountered a similar issue.

    Here is my finding so far. Given there are 151208 rows in the replays_game, the average cost of hitting an item is about log(151208)=12. Since there are 3395 records in replays_playeringame after filtering, the average cost is 12*3395, which is rather high. Also, the planner overestimated the page cost: it assumes all rows are randomly distributed, while it is not. Should that be true, a seq scan would be much better. So basically, the query plan is trying to avoid the worst scenarios.

    @dsjoerg's problem is that there is no index on replays_playeringame(game_id). Index scan would be always used if there is an index on replays_playeringame(game_id): the cost of scanning index would become 3395+12 (or something close to that).

    @Neil suggested to have index on (player_id, game_id), which is close but not exact. The right index to have is either (game_id) or (game_id, player_id).

提交回复
热议问题