Postgres_FDW not pushing down WHERE criteria

耗尽温柔 提交于 2020-03-02 02:17:17

问题


I'm working with two PostgreSQL 9.6 databases and am trying to query one of the DB's from the other using postgres_fdw (one is a production backup DB that has the data and the other is a db for doing various analyses).

I've come across some odd behavior though where certain types of WHERE clauses in a query aren't being passed to the remote DB but instead are retained in the local DB and used to filter the results received from the remote DB. This is causing the remote DB to try to send way more info than the local DB needs across the network and the affected queries are dramatically slower (15 seconds vs 15 minutes).

I've mostly seen this with timestamp related clauses, the below examples are how I first came across the problem, but I've seen it in several other variations such as replacing CURRENT_TIMESTAMP with a TIMESTAMP literal (slow) or TIMESTAMP WITH TIME ZONE literal (fast).

Is there a setting somewhere that I'm missing that'll help with this? I'm setting this up for a team to use with a mixed level of SQL backgrounds, most aren't experienced with reviewing EXPLAIN plans and whatnot. I've come up with some workarounds (such as putting the relative time clauses in sub-SELECT), but I keep coming across new instances of the problem.

An example:

SELECT      var_1
           ,var_2
FROM        schema_A.table_A
WHERE       execution_ts <= CURRENT_TIMESTAMP - INTERVAL '1 hour'
        AND execution_ts >= CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour'
ORDER BY    var_1

Explain Plan

Sort  (cost=147.64..147.64 rows=1 width=1048)
  Output: table_A.var_1, table_A.var_2
  Sort Key: (table_A.var_1)::text
  ->  Foreign Scan on schema_A.table_A  (cost=100.00..147.63 rows=1 width=1048)
        Output: table_A.var_1, table_A.var_2
        Filter: ((table_A.execution_ts <= (now() - '01:00:00'::interval)) 
             AND (table_A.execution_ts >= ((now() - '7 days'::interval) - '01:00:00'::interval)))
        Remote SQL: SELECT var_1, execution_ts FROM model.table_A
                    WHERE ((model_id::text = 'ABCD'::text))
                      AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))

The above takes around 15-20 minutes to run, while the below completes in seconds.

SELECT      var_1
           ,var_2
FROM        schema_A.table_A
WHERE       execution_ts <= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 hour')
        AND execution_ts >= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour')
ORDER BY    var_1

Explain Plan

Sort  (cost=158.70..158.71 rows=1 width=16)
  Output: table_A.var_1, table_A.var_2
  Sort Key: table_A.var_1
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=8)
          Output: (now() - '01:00:00'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8)
          Output: ((now() - '7 days'::interval) - '01:00:00'::interval)
  ->  Foreign Scan on schema_A.table_A  (cost=100.00..158.66 rows=1 width=16)
        Output: table_A.var_1, table_A.var_2
        Remote SQL: SELECT var_1, var_2 FROM model.table_A
                    WHERE ((execution_ts <= $1::timestamp with time zone))
                      AND ((execution_ts >= $2::timestamp with time zone))
                      AND ((model_id::text = 'ABCD'::text))
                      AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))

回答1:


Any function that is not IMMUTABLE will not be pushed down.

See function is_foreign_expr in contrib/postgres_fdw/deparse.c:

/*
 * Returns true if given expr is safe to evaluate on the foreign server.
 */
bool
is_foreign_expr(PlannerInfo *root,
                RelOptInfo *baserel,
                Expr *expr)
{
...
    /*   
     * An expression which includes any mutable functions can't be sent over
     * because its result is not stable.  For example, sending now() remote
     * side could cause confusion from clock offsets.  Future versions might
     * be able to make this choice with more granularity.  (We check this last
     * because it requires a lot of expensive catalog lookups.)
     */
    if (contain_mutable_functions((Node *) expr))
        return false;

    /* OK to evaluate on the remote server */
    return true;
}



回答2:


I think the problem may be the execution contenxt of now() (which CURRENT_TIMESTAMP resolves to).
the valur returned by now() is fixed for the current transaction - this means that it must be executed locally.

wrapping it it the subselect coerces it to a constant timestamptz value allowing the evaluation th be performed remotely.

using a timestamp constant necessitatie conversion between timestamp and timestamptz which is performed using the current time zone rules (as per SET TIME ZONE TO ....) and if the database chooses to convert the remote timestamptz to localtime for comparison with the timestamp literal again this must be done locally.

in general timestamp should be avoided (use timestamptz instead) except where you

  1. want the values to follow any changes made to daylight-saving rules, and
  2. are certain you will never want to represent timestamps during the added hour in the autumn


来源:https://stackoverflow.com/questions/50164775/postgres-fdw-not-pushing-down-where-criteria

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!