Slow performance for deeply nested subquery factoring (CTE)

喜夏-厌秋 提交于 2019-11-30 13:17:59

Q1: Seems that there are nothing about calculation time, just bug in optimizer algorithm which make it mad while calculating a best execution plan.

Q2: There are a number of known and fixed bugs in Oracle 11.X.0.X related to optimization of nested queries and query factoring. But it's very hard to find a concrete issue.

Q3: There are two undocumented hints: materialize and inline but no one of them works for me while I tried your example. It's possible that some changes in server configuration or upgrading to 11.2.0.3 may increase limit of nested with clauses: for me (on 11.2.0.3 Win7/x86) your example works fine, but increasing number of nested tables to 30 hangs a session.

Workaround may look like this:

select k from (
select k, avg(k) over (partition by null) k_avg from ( --t16
  select k, avg(k) over (partition by null) k_avg from ( --t15
    select k, avg(k) over (partition by null) k_avg from ( --t14
      select k, avg(k) over (partition by null) k_avg from ( --t13
        select k, avg(k) over (partition by null) k_avg from ( --t12
          select k, avg(k) over (partition by null) k_avg from ( --t11
            select k, avg(k) over (partition by null) k_avg from ( --t10
              select k, avg(k) over (partition by null) k_avg from ( --t9
                select k, avg(k) over (partition by null) k_avg from ( --t8
                  select k, avg(k) over (partition by null) k_avg from ( --t7
                    select k, avg(k) over (partition by null) k_avg from ( --t6
                      select k, avg(k) over (partition by null) k_avg from ( --t5
                        select k, avg(k) over (partition by null) k_avg from ( --t4
                          select k, avg(k) over (partition by null) k_avg from ( --t3
                            select k, avg(k) over (partition by null) k_avg from ( --t2
                              select k, avg(k) over (partition by null) k_avg from ( -- t1
                                select k, avg(k) over (partition by null) k_avg from (select 0 as k from dual) t0
                              ) where k >= k_avg
                            ) where k >= k_avg
                          ) where k >= k_avg
                        ) where k >= k_avg
                      ) where k >= k_avg
                    ) where k >= k_avg
                  ) where k >= k_avg
                ) where k >= k_avg
              ) where k >= k_avg
            ) where k >= k_avg
          ) where k >= k_avg
        ) where k >= k_avg
      ) where k >= k_avg
    ) where k >= k_avg
  ) where k >= k_avg
) where k >= k_avg
)

At least it works for me on nesting level of 30 and produces totally different execution plan with WINDOW BUFFER and VIEW instead of LOAD TABLE AS SELECT, SORT AGGREGATE and TABLE ACCESS FULL.

Update

  1. Just installed 11.2.0.4 (Win7/32bit) and test it against initial query. Nothing changed in optimizer behavior.

  2. There are no possibility to directly affect a CBO behavior, even with use of inline (undocumented) or RULE (deprecated) hints. May be some Guru knows a some variant, but it's a Top Secret for me (and Google too :-) .

  3. Doing things in a one select statement in reasonable time is possible if a main select statement separated into a parts and placed into the function which returns a set of rows (function returning sys_refcursor or strong typed cursor), but it's not a choice if a query constructed at runtime.

  4. Workaround with usage of XML is possible, but this variant looks like removing a tonsil through the ass hole (sorry):

.

select
  extractvalue(column_value,'/t/somevalue') abc
from 
  table(xmlsequence((
    select t2 from (
      select
        t0,
        t1,
        (   
          select xmlagg(
                   xmlelement("t", 
                     xmlelement("k1",extractvalue(t1t.column_value,'/t/k1')), 
                     xmlelement("somevalue", systimestamp))
                  )
          from 
            table(xmlsequence(t0)) t0t, 
            table(xmlsequence(t1)) t1t  
          where 
            extractvalue(t1t.column_value,'/t/k1') >= (
              select avg(extractvalue(t1t.column_value, '/t/k1')) from table(xmlsequence(t1))
            )                                              
            and 
            extractvalue(t0t.column_value,'/t/k2') > 6
        ) t2
      from (
        select
          t0,
          (
            select xmlagg(
                     xmlelement("t", 
                       xmlelement("k1",extractvalue(column_value,'/t/k1')), 
                       xmlelement("somevalue", sysdate))
                    )
            from table(xmlsequence(t0))   
            where 
              extractvalue(column_value,'/t/k1') >= (
                select avg(extractvalue(column_value, '/t/k1')) from table(xmlsequence(t0))
              )
          ) t1
        from (
          select
            xmlagg(xmlelement("t", xmlelement("k1", level), xmlelement("k2", level + 3))) t0
          from dual connect by level < 5
        )
      )
    )
  )))

Another thing about a strange code above is that this variant applicable only if with data sets didn't have a big number of rows.

(This is isn't a full answer. Hopefully the information here will help someone else produce a better answer.)

Q1: The optimizer rewrites the query by inlining everything. The internal statement doubles in size with each new common table expression and the statements quickly grow ginormous. For example, T15 produces a 3,162,172 character query.

Code to trace the statements:

sqlplus user/pass@orcl

alter session set events '10053 trace name context forever, level 1';

with t0 as (select 0 as k from dual)
,t1 as (select k from t0 where k >= (select avg(k) from t0))
,t2 as (select k from t1 where k >= (select avg(k) from t1))
select k from t2;

exit;


sqlplus user/pass@orcl

alter session set events '10053 trace name context forever, level 1';

with t0 as (select 0 as k from dual)
,t1 as (select k from t0 where k >= (select avg(k) from t0))
,t2 as (select k from t1 where k >= (select avg(k) from t1))
,t3 as (select k from t2 where k >= (select avg(k) from t2))
select k from t3;
exit;

If you compare the two trace files there are many differences, but most of them look very minor. The real difference is in only a single line that comes right after the string: Stmt: ******* UNPARSED QUERY IS *******. Be careful opening the trace files if you trace larger queries. Not all editors can handle such huge lines. And the T20 file was 250MB!

SQL from first trace, after formatting:

SELECT "T1"."K" "K"
  FROM (SELECT 0 "K"
          FROM "SYS"."DUAL" "DUAL"
         WHERE 0 >= (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1"
 WHERE "T1"."K" >=
       (SELECT AVG("T1"."K") "AVG(K)"
          FROM (SELECT 0 "K"
                  FROM "SYS"."DUAL" "DUAL"
                 WHERE 0 >= (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1")

SQL from second trace, after formatting:

SELECT "T2"."K" "K"
  FROM (SELECT "T1"."K" "K"
          FROM (SELECT 0 "K"
                  FROM "SYS"."DUAL" "DUAL"
                 WHERE 0 >= (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1"
         WHERE "T1"."K" >=
               (SELECT AVG("T1"."K") "AVG(K)"
                  FROM (SELECT 0 "K"
                          FROM "SYS"."DUAL" "DUAL"
                         WHERE 0 >=
                               (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1")) "T2"
 WHERE "T2"."K" >=
       (SELECT AVG("T2"."K") "AVG(K)"
          FROM (SELECT "T1"."K" "K"
                  FROM (SELECT 0 "K"
                          FROM "SYS"."DUAL" "DUAL"
                         WHERE 0 >=
                               (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1"
                 WHERE "T1"."K" >=
                       (SELECT AVG("T1"."K") "AVG(K)"
                          FROM (SELECT 0 "K"
                                  FROM "SYS"."DUAL" "DUAL"
                                 WHERE 0 >= (SELECT AVG(0) "AVG(K)"
                                               FROM "SYS"."DUAL" "DUAL")) "T1")) "T2")

Q2: I wouldn't say every "complex" common table expression would behave the same way. I've seen CTEs that were much larger. It's only the extreme nesting that seems to be the problem. I couldn't find any obvious bugs on Oracle Support.

ThinkJet's code looks like a good work around. Nesting inline views is more common than nesting common table expressions.

Q3: There probably is a hint to prevent this behavior, but I'm not sure what it is. Hopefully by showing the transformed version of the query someone else can guess how to fix it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!