PostgresSQL Nested Loops - When does the planner decide to use Nested Loop when doing an INNER JOIN?

一个人想着一个人 提交于 2020-01-30 08:14:23

问题


I am running a query with an INNER JOIN where the planner decides to use a Nested Loop. I've figured out that it has do with the WHERE conditions as I have tried writing the query with different WHERE conditions so it returns the same result but does not use a Nested Loop.

My question is why has the planner decided to make the different decisions when the queries appear to be identical as they both return the same result? The query runs in 77 secs with the Nested Loop and in 13 sec without, and the query that runs in 13 sec is quite ugly and inelegant making me think there is a better way to write it.

Here are the two queries. Note that the difference between the two is how the WHERE clause filters by date where the first uses BETWEEN and the second uses a series of OR statements. I am aware that it's strange that current_date is wrapped in their own subqueries but that is because these queries are using foreign data wrappers. This allows current_date to be passed as an immutable object to greatly speed up performance.

SELECT ROUND(AVG(m.forecast - w.wind),6) from pjm.wind_forecast_recent w
    INNER JOIN pjm.load_forecast_recent m ON w.pricedate = m.pricedate AND w.hour = m.hour
  WHERE w.hour = 5 AND m.area = 'RTO_COMBINED' AND 
            (w.pricedate BETWEEN (SELECT current_date-6) AND (SELECT current_date));

-----------

SELECT ROUND(AVG(m.forecast - w.wind),6) from pjm.wind_forecast_recent w
    INNER JOIN pjm.load_forecast_recent m ON w.pricedate = m.pricedate AND w.hour = m.hour
  WHERE w.hour = 5 AND m.area = 'RTO_COMBINED' AND (
    w.pricedate = (SELECT current_date-6) OR
    w.pricedate = (SELECT current_date-5) OR
    w.pricedate = (SELECT current_date-4) OR
    w.pricedate = (SELECT current_date-3) OR
    w.pricedate = (SELECT current_date-2) OR
    w.pricedate = (SELECT current_date-1) OR
    w.pricedate = (SELECT current_date))

And here are the respective EXPLAIN ANALYZE:

Aggregate  (cost=842341.01..842341.02 rows=1 width=32) (actual time=77120.088..77120.089 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  ->  Nested Loop  (cost=840333.25..842340.97 rows=1 width=18) (actual time=14719.661..77119.994 rows=7 loops=1)
        ->  Foreign Scan on wind_forecast_recent w  (cost=242218.45..242218.49 rows=1 width=18) (actual time=3184.714..3184.720 rows=7 loops=1)
        ->  Foreign Scan on load_forecast_recent m  (cost=598114.80..600122.47 rows=1 width=16) (actual time=10531.723..10531.724 rows=1 loops=7)
Planning Time: 744.979 ms
Execution Time: 77227.512 ms
Aggregate  (cost=841657.94..841657.95 rows=1 width=32) (actual time=13683.022..13683.023 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 3 (returns $2)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 4 (returns $3)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 5 (returns $4)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 6 (returns $5)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 7 (returns $6)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  ->  Foreign Scan  (cost=833725.15..841657.83 rows=1 width=18) (actual time=13682.974..13682.977 rows=7 loops=1)
        Relations: (pjm.wind_forecast_recent w) INNER JOIN (pjm.load_forecast_recent m)
Planning Time: 332.870 ms
JIT:
  Functions: 16
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 4.163 ms, Inlining 15.088 ms, Optimization 44.489 ms, Emission 28.064 ms, Total 91.804 ms
Execution Time: 13724.094 ms

I am running PostgreSQL 12.1 on an Ubuntu 18.04 server.

Let me know if you have any further questions. Thanks!


回答1:


The planner does not decide to use a certain join strategy based on deep reasoning, it simply constructs all possible join strategies, estimates the cost and chooses the cheapest one.

That said, nested loop joins are usually the best choice if the outer table is small, so that the inner loop does not have to be executed often. Also, an index on the join condition of the inner table can greatly reduce the cost of a nested loop join and make it an attractive strategy.

In your case, the bad choice is due to a mis-estimate:

Foreign Scan on wind_forecast_recent w  (cost=... rows=1 ...) (actual ... rows=7 ...)

That causes the inner loop to be executed 7 times rather than once, so that the execution time is 70 seconds rather than 10.

You should collect table statistics on wind_forecast_recent:

ANALYZE wind_forecast_recent;

Remember that autoanalyze does not treat foreign tables; you have to take care of that yourself.

If that doesn't do the trick, you can try setting the use_remote_estimate option on the foreign table and make sure that the table statistics are accurate on the remote database.



来源:https://stackoverflow.com/questions/59942397/postgressql-nested-loops-when-does-the-planner-decide-to-use-nested-loop-when

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!