对SQL优化,有一定理解的人都会知道,SQL优化的核心是减少物理IO的次数,说的通俗点,我们要尽量减少表的扫描次数,这里的表主要是大表。
今天说的子查询,我们可以理解为SQL包含IN, NOT IN, EXISTS, NOT EXISTS的语句, 以前经常有人会问IN和EXISTS到底怎么选,也有说EXISTS的性能更好,或者根据内表和外表的数据量来选择IN和EXISTS,其实在我看来,这些比较片面。哪个好,我们还是要看执行计划和执行时间。
当SQL中含有IN, NOT IN, EXISTS, NOT EXISTS的时候,优化器会尝试改写,为什么要改写呢? 因为这些东西会导致一种叫Filter的东西,是不是很熟悉,前面的文章写到过哦,不知道的可以翻翻我前面的文章。
下面看看例子:
SQL> explain plan for
select EMPNO,ENAME
from emp
where exists
(select DEPTNO from dept
where
emp.DEPTNO = dept.DEPTNO
and dept.loc = ‘BOSTON’
union all
select DEPTNO from dept
where
emp.DEPTNO = dept.DEPTNO
and dept.loc = ‘DALLAS’
);
2 3 4 5 6 7 8 9 10 11 12 13 14
Explained.
SQL> set linesize 400
SELECT * FROM TABLE(dbms_xplan.display);SQL>
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1230022885
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 65 | 9 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
| 3 | UNION-ALL | | | | | |
|* 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 6 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS ( (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE
“DEPT”.“DEPTNO”=:B1 AND “DEPT”.“LOC”=‘BOSTON’) UNION ALL (SELECT “DEPTNO” FROM
“DEPT” “DEPT” WHERE “DEPT”.“DEPTNO”=:B2 AND “DEPT”.“LOC”=‘DALLAS’)))
4 - filter(“DEPT”.“LOC”=‘BOSTON’)
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
5 - access(“DEPT”.“DEPTNO”=:B1)
6 - filter(“DEPT”.“LOC”=‘DALLAS’)
7 - access(“DEPT”.“DEPTNO”=:B1)
SQL> explain plan for
select EMPNO,ENAME
from emp
where DEPTNO in
(select DEPTNO from dept
where
dept.loc = ‘BOSTON’
union all
select DEPTNO from dept
where
dept.loc = ‘DALLAS’
); 2 3 4 5 6 7 8 9 10 11 12
Explained.
SQL> set linesize 400
SELECT * FROM TABLE(dbms_xplan.display);SQL>
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1230022885
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 117 | 9 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
| 3 | UNION-ALL | | | | | |
|* 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 6 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS ( (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE “DEPTNO”=:B1
AND “DEPT”.“LOC”=‘BOSTON’) UNION ALL (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE
“DEPTNO”=:B2 AND “DEPT”.“LOC”=‘DALLAS’)))
4 - filter(“DEPT”.“LOC”=‘BOSTON’)
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
5 - access(“DEPTNO”=:B1)
6 - filter(“DEPT”.“LOC”=‘DALLAS’)
7 - access(“DEPTNO”=:B1)
这种写法熟悉了exist/in (xxx union all yyyy),肯定有人写过这样的SQL,我也不止一次看到这样的写法,但是估计大家都没有执行执行计划,因为如果数据量小的时候,什么都无所谓,你SQL在烂也没事,但是我们要考虑的是未来,是future. 数据量大了之后,我们会发现,SQL跑不动了。
那我们来分析下执行计划,看到ID=1的地方的FILTER(:B1)了吗,注意他,注意他,注意他,重要的事情说三遍。
1 - filter( EXISTS ( (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE
“DEPT”.“DEPTNO”=:B1 AND “DEPT”.“LOC”=‘BOSTON’) UNION ALL (SELECT “DEPTNO” FROM
“DEPT” “DEPT” WHERE “DEPT”.“DEPTNO”=:B2 AND “DEPT”.“LOC”=‘DALLAS’)))
还在要对于的这段说明什么呢?
说明emp通过deptno连接键把值传给了下面的dept表来扫描,假如emp表的deptno是唯一的,没有重复,假如有1000W行deptno,我们可以想下下面这段要执行多少次呢? dept表要被扫描多少次呢? 1000W次,如果
dept大小是10G, 那要扫描1000W次*10G,你的SQL还会跑的出来吗?
别想了 cancel吧。
这种写法建议改写,我们先说第一种,就是把union all分开,把整个SQL分成2段来写,具体怎么写,大家自己动动手,试试,看看效果怎么样。
下面我们试试IN怎么样。 不是很多人说IN性能很差吗?试试就知道
SQL> explain plan for
select EMPNO,ENAME
from emp
where exists
(select DEPTNO from dept
where
emp.DEPTNO = dept.DEPTNO
and dept.loc = ‘BOSTON’
and rownum<=1
);
2 3 4 5 6 7 8 9 10
Explained.
SQL> set linesize 400
SELECT * FROM TABLE(dbms_xplan.display);
SQL>
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3414630506
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 65 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
|* 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM “DEPT” “DEPT” WHERE ROWNUM<=1 AND
“DEPT”.“DEPTNO”=:B1 AND “DEPT”.“LOC”=‘BOSTON’))
3 - filter(ROWNUM<=1)
4 - filter(“DEPT”.“LOC”=‘BOSTON’)
5 - access(“DEPT”.“DEPTNO”=:B1)
21 rows selected.
SQL>
explain plan for
select EMPNO,ENAME
from emp
where DEPTNO in
(select DEPTNO from dept
where
dept.loc = ‘BOSTON’
and rownum<=1
);
SQL> 2 3 4 5 6 7 8 9
Explained.
SQL> set linesize 400
SELECT * FROM TABLE(dbms_xplan.display); SQL>
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3841060209
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 130 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 5 | 130 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 1 | 13 | 3 (0)| 00:00:01 |
|* 4 | COUNT STOPKEY | | | | | |
|* 5 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(“DEPTNO”=“DEPTNO”)
4 - filter(ROWNUM<=1)
5 - filter(“DEPT”.“LOC”=‘BOSTON’)
19 rows selected.
上面的两段代码分别用了EXISTS和IN,但是奇怪的是EXISTS用了FILTER, IN却用了HASH JOIN,如果在都是2个大表的情况下,无疑是IN的性能更好点。
所以我说EXISTS和IN ,谁好谁坏,不要轻易下定论,一些事实来说话(执行计划和执行时间)。
还有要介绍一对hint, unnest / no_unnest, 就是为了手动干预FILTER的
SQL> explain plan for
select EMPNO,ename
from emp
where exists
(select /+ unnest/ DEPTNO from dept
where
emp.DEPTNO = dept.DEPTNO
and dept.loc = ‘BOSTON’
union
select DEPTNO from dept
where
emp.DEPTNO = dept.DEPTNO
and dept.loc = ‘DALLAS’
);
2 3 4 5 6 7 8 9 10 11 12 13 14
Explained.
SQL> set linesize 400
SELECT * FROM TABLE(dbms_xplan.display);SQL>
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3446838818
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 130 | 12 (25)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 5 | 130 | 12 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
| 3 | VIEW | VW_SQ_1 | 2 | 26 | 8 (25)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 22 | 8 (63)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 6 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(“EMP”.“DEPTNO”=“VW_COL_1”)
6 - filter(“DEPT”.“LOC”=‘BOSTON’)
7 - filter(“DEPT”.“LOC”=‘DALLAS’)
21 rows selected.
SQL> explain plan for
select EMPNO,ENAME
from emp
where DEPTNO in
(select /+ unnest/ DEPTNO from dept
where
dept.loc = ‘BOSTON’
and rownum<=1
); 2 3 4 5 6 7 8 9
Explained.
SQL> set linesize 400
SELECT * FROM TABLE(dbms_xplan.display); SQL>
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3841060209
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 130 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 5 | 130 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 1 | 13 | 3 (0)| 00:00:01 |
|* 4 | COUNT STOPKEY | | | | | |
|* 5 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(“DEPTNO”=“DEPTNO”)
4 - filter(ROWNUM<=1)
5 - filter(“DEPT”.“LOC”=‘BOSTON’)
19 rows selected.
对于文首说的SQL第二种优化方法,一个hint轻松解决。
说到这里大家肯定是把FILTER当成妖物了,其实不是,任何东西存在即合理,如果FILTER一无是处,ORACLE问什么还保留呢? 只是说大部分情况下,大家要对FILTER各位关注,它是一个容易出问题的地方,具体FILTER在什么情况下可以用,建议看看我前面的标量子查询一文。
下面说下个人意见吧,在多数情况下,我自己更喜欢用IN(oracle里面),因为IN对SQL的执行计划调整更灵活点,当子查询用有 union all , rownum, start with connect by, cube的时候,更容易出现FILTER,因为子查询会被固话,exists会通过连接键把数据传入到内表做Filter.前面文章也说过。
SQL> explain plan for
select EMPNO,ename
from emp
where exists
(select DEPTNO from dept
where
emp.DEPTNO = dept.DEPTNO
and dept.loc = ‘BOSTON’
union
select DEPTNO from dept
where
emp.DEPTNO = dept.DEPTNO
and dept.loc = ‘DALLAS’
); 2 3 4 5 6 7 8 9 10 11 12 13 14
Explained.
SQL> set linesize 400
SELECT * FROM TABLE(dbms_xplan.display);
SQL>
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3446838818
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 130 | 12 (25)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 5 | 130 | 12 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
| 3 | VIEW | VW_SQ_1 | 2 | 26 | 8 (25)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 22 | 8 (63)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 6 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(“EMP”.“DEPTNO”=“VW_COL_1”)
6 - filter(“DEPT”.“LOC”=‘BOSTON’)
7 - filter(“DEPT”.“LOC”=‘DALLAS’)
这里是我测试的union的案例, exists貌似没有走FILTER, 情况和union all不一样
今天就说到这里
个人意见,望指正
mobile.xasgnk.cn
mobile.0411nk.cn