SQL优化之子查询-IN和EXISTS哪个快点

对SQL优化，有一定理解的人都会知道，SQL优化的核心是减少物理IO的次数，说的通俗点，我们要尽量减少表的扫描次数，这里的表主要是大表。

今天说的子查询，我们可以理解为SQL包含IN, NOT IN, EXISTS, NOT EXISTS的语句，以前经常有人会问IN和EXISTS到底怎么选，也有说EXISTS的性能更好，或者根据内表和外表的数据量来选择IN和EXISTS，其实在我看来，这些比较片面。哪个好，我们还是要看执行计划和执行时间。

当SQL中含有IN, NOT IN, EXISTS, NOT EXISTS的时候，优化器会尝试改写，为什么要改写呢? 因为这些东西会导致一种叫Filter的东西，是不是很熟悉，前面的文章写到过哦，不知道的可以翻翻我前面的文章。

下面看看例子：

SQL> explain plan for

select EMPNO,ENAME

from emp

where exists

(select DEPTNO from dept

where

emp.DEPTNO = dept.DEPTNO

and dept.loc = ‘BOSTON’

union all

select DEPTNO from dept

where

emp.DEPTNO = dept.DEPTNO

and dept.loc = ‘DALLAS’

);

2 3 4 5 6 7 8 9 10 11 12 13 14

Explained.

SQL> set linesize 400

SELECT * FROM TABLE(dbms_xplan.display);SQL>

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 1230022885

-----------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 65 | 9 (0)| 00:00:01 |

|* 1 | FILTER | | | | | |

| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |

| 3 | UNION-ALL | | | | | |

|* 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |

|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

|* 6 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |

|* 7 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |

-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter( EXISTS ( (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE

“DEPT”.“DEPTNO”=:B1 AND “DEPT”.“LOC”=‘BOSTON’) UNION ALL (SELECT “DEPTNO” FROM

“DEPT” “DEPT” WHERE “DEPT”.“DEPTNO”=:B2 AND “DEPT”.“LOC”=‘DALLAS’)))

4 - filter(“DEPT”.“LOC”=‘BOSTON’)

PLAN_TABLE_OUTPUT

5 - access(“DEPT”.“DEPTNO”=:B1)

6 - filter(“DEPT”.“LOC”=‘DALLAS’)

7 - access(“DEPT”.“DEPTNO”=:B1)

SQL> explain plan for

select EMPNO,ENAME

from emp

where DEPTNO in

(select DEPTNO from dept

where

dept.loc = ‘BOSTON’

union all

select DEPTNO from dept

where

dept.loc = ‘DALLAS’

); 2 3 4 5 6 7 8 9 10 11 12

Explained.

SQL> set linesize 400

SELECT * FROM TABLE(dbms_xplan.display);SQL>

PLAN_TABLE_OUTPUT

Plan hash value: 1230022885

-----------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 9 | 117 | 9 (0)| 00:00:01 |

|* 1 | FILTER | | | | | |

| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |

| 3 | UNION-ALL | | | | | |

|* 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |

|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

|* 6 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |

|* 7 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |

-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter( EXISTS ( (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE “DEPTNO”=:B1

AND “DEPT”.“LOC”=‘BOSTON’) UNION ALL (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE

“DEPTNO”=:B2 AND “DEPT”.“LOC”=‘DALLAS’)))

4 - filter(“DEPT”.“LOC”=‘BOSTON’)

PLAN_TABLE_OUTPUT

5 - access(“DEPTNO”=:B1)

6 - filter(“DEPT”.“LOC”=‘DALLAS’)

7 - access(“DEPTNO”=:B1)

这种写法熟悉了exist/in (xxx union all yyyy),肯定有人写过这样的SQL，我也不止一次看到这样的写法，但是估计大家都没有执行执行计划，因为如果数据量小的时候，什么都无所谓，你SQL在烂也没事，但是我们要考虑的是未来，是future. 数据量大了之后，我们会发现，SQL跑不动了。

那我们来分析下执行计划，看到ID=1的地方的FILTER(:B1)了吗，注意他，注意他，注意他，重要的事情说三遍。

1 - filter( EXISTS ( (SELECT “DEPTNO” FROM “DEPT” “DEPT” WHERE

“DEPT”.“DEPTNO”=:B1 AND “DEPT”.“LOC”=‘BOSTON’) UNION ALL (SELECT “DEPTNO” FROM

“DEPT” “DEPT” WHERE “DEPT”.“DEPTNO”=:B2 AND “DEPT”.“LOC”=‘DALLAS’)))

还在要对于的这段说明什么呢?

说明emp通过deptno连接键把值传给了下面的dept表来扫描，假如emp表的deptno是唯一的，没有重复，假如有1000W行deptno,我们可以想下下面这段要执行多少次呢? dept表要被扫描多少次呢? 1000W次，如果

dept大小是10G，那要扫描1000W次*10G，你的SQL还会跑的出来吗?

别想了 cancel吧。

这种写法建议改写，我们先说第一种，就是把union all分开，把整个SQL分成2段来写，具体怎么写，大家自己动动手，试试，看看效果怎么样。

下面我们试试IN怎么样。不是很多人说IN性能很差吗?试试就知道

SQL> explain plan for

select EMPNO,ENAME

from emp

where exists

(select DEPTNO from dept

where

emp.DEPTNO = dept.DEPTNO

and dept.loc = ‘BOSTON’

and rownum<=1

);

2 3 4 5 6 7 8 9 10

Explained.

SQL> set linesize 400

SELECT * FROM TABLE(dbms_xplan.display);

SQL>

PLAN_TABLE_OUTPUT

Plan hash value: 3414630506

-----------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 65 | 6 (0)| 00:00:01 |

|* 1 | FILTER | | | | | |

| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |

|* 3 | COUNT STOPKEY | | | | | |

|* 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 11 | 1 (0)| 00:00:01 |

|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter( EXISTS (SELECT 0 FROM “DEPT” “DEPT” WHERE ROWNUM<=1 AND

“DEPT”.“DEPTNO”=:B1 AND “DEPT”.“LOC”=‘BOSTON’))

3 - filter(ROWNUM<=1)

4 - filter(“DEPT”.“LOC”=‘BOSTON’)

5 - access(“DEPT”.“DEPTNO”=:B1)

21 rows selected.

SQL>

explain plan for

select EMPNO,ENAME

from emp

where DEPTNO in

(select DEPTNO from dept

where

dept.loc = ‘BOSTON’

and rownum<=1

);

SQL> 2 3 4 5 6 7 8 9

Explained.

SQL> set linesize 400

SELECT * FROM TABLE(dbms_xplan.display); SQL>

PLAN_TABLE_OUTPUT

Plan hash value: 3841060209

---------------------------------------------------------------------------------

---------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 130 | 7 (15)| 00:00:01 |

|* 1 | HASH JOIN SEMI | | 5 | 130 | 7 (15)| 00:00:01 |

| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |

| 3 | VIEW | VW_NSO_1 | 1 | 13 | 3 (0)| 00:00:01 |

|* 4 | COUNT STOPKEY | | | | | |

|* 5 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access(“DEPTNO”=“DEPTNO”)

4 - filter(ROWNUM<=1)

5 - filter(“DEPT”.“LOC”=‘BOSTON’)

19 rows selected.

上面的两段代码分别用了EXISTS和IN，但是奇怪的是EXISTS用了FILTER, IN却用了HASH JOIN，如果在都是2个大表的情况下，无疑是IN的性能更好点。

所以我说EXISTS和IN ，谁好谁坏，不要轻易下定论，一些事实来说话(执行计划和执行时间)。

还有要介绍一对hint, unnest / no_unnest, 就是为了手动干预FILTER的

SQL> explain plan for

select EMPNO,ename

from emp

where exists

(select /+ unnest/ DEPTNO from dept

where

emp.DEPTNO = dept.DEPTNO

and dept.loc = ‘BOSTON’

union

select DEPTNO from dept

where

emp.DEPTNO = dept.DEPTNO

and dept.loc = ‘DALLAS’

);

2 3 4 5 6 7 8 9 10 11 12 13 14

Explained.

SQL> set linesize 400

SELECT * FROM TABLE(dbms_xplan.display);SQL>

PLAN_TABLE_OUTPUT

Plan hash value: 3446838818

---------------------------------------------------------------------------------

---------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 130 | 12 (25)| 00:00:01 |

|* 1 | HASH JOIN SEMI | | 5 | 130 | 12 (25)| 00:00:01 |

| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |

| 3 | VIEW | VW_SQ_1 | 2 | 26 | 8 (25)| 00:00:01 |

| 4 | SORT UNIQUE | | 1 | 22 | 8 (63)| 00:00:01 |

| 5 | UNION-ALL | | | | | |

PLAN_TABLE_OUTPUT

|* 6 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |

|* 7 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |

---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access(“EMP”.“DEPTNO”=“VW_COL_1”)

6 - filter(“DEPT”.“LOC”=‘BOSTON’)

7 - filter(“DEPT”.“LOC”=‘DALLAS’)

21 rows selected.

SQL> explain plan for

select EMPNO,ENAME

from emp

where DEPTNO in

(select /+ unnest/ DEPTNO from dept

where

dept.loc = ‘BOSTON’

and rownum<=1

); 2 3 4 5 6 7 8 9

Explained.

SQL> set linesize 400

SELECT * FROM TABLE(dbms_xplan.display); SQL>

PLAN_TABLE_OUTPUT

Plan hash value: 3841060209

---------------------------------------------------------------------------------

---------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 130 | 7 (15)| 00:00:01 |

|* 1 | HASH JOIN SEMI | | 5 | 130 | 7 (15)| 00:00:01 |

| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |

| 3 | VIEW | VW_NSO_1 | 1 | 13 | 3 (0)| 00:00:01 |

|* 4 | COUNT STOPKEY | | | | | |

|* 5 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access(“DEPTNO”=“DEPTNO”)

4 - filter(ROWNUM<=1)

5 - filter(“DEPT”.“LOC”=‘BOSTON’)

19 rows selected.

对于文首说的SQL第二种优化方法，一个hint轻松解决。

说到这里大家肯定是把FILTER当成妖物了，其实不是，任何东西存在即合理，如果FILTER一无是处，ORACLE问什么还保留呢? 只是说大部分情况下，大家要对FILTER各位关注，它是一个容易出问题的地方，具体FILTER在什么情况下可以用，建议看看我前面的标量子查询一文。

下面说下个人意见吧，在多数情况下，我自己更喜欢用IN(oracle里面)，因为IN对SQL的执行计划调整更灵活点，当子查询用有 union all , rownum, start with connect by, cube的时候，更容易出现FILTER,因为子查询会被固话，exists会通过连接键把数据传入到内表做Filter.前面文章也说过。

SQL> explain plan for

select EMPNO,ename

from emp

where exists

(select DEPTNO from dept

where

emp.DEPTNO = dept.DEPTNO

and dept.loc = ‘BOSTON’

union

select DEPTNO from dept

where

emp.DEPTNO = dept.DEPTNO

and dept.loc = ‘DALLAS’

); 2 3 4 5 6 7 8 9 10 11 12 13 14

Explained.

SQL> set linesize 400

SELECT * FROM TABLE(dbms_xplan.display);

SQL>

PLAN_TABLE_OUTPUT

Plan hash value: 3446838818

---------------------------------------------------------------------------------

---------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 130 | 12 (25)| 00:00:01 |

|* 1 | HASH JOIN SEMI | | 5 | 130 | 12 (25)| 00:00:01 |

| 2 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |

| 3 | VIEW | VW_SQ_1 | 2 | 26 | 8 (25)| 00:00:01 |

| 4 | SORT UNIQUE | | 1 | 22 | 8 (63)| 00:00:01 |

| 5 | UNION-ALL | | | | | |

PLAN_TABLE_OUTPUT

|* 6 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |

|* 7 | TABLE ACCESS FULL| DEPT | 1 | 11 | 3 (0)| 00:00:01 |

---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access(“EMP”.“DEPTNO”=“VW_COL_1”)

6 - filter(“DEPT”.“LOC”=‘BOSTON’)

7 - filter(“DEPT”.“LOC”=‘DALLAS’)

这里是我测试的union的案例， exists貌似没有走FILTER, 情况和union all不一样

今天就说到这里

个人意见，望指正

　　 mobile.xasgnk.cn

　　 mobile.0411nk.cn

文章来源: https://blog.csdn.net/qq_42894764/article/details/92639090

标签

select

exists

sql优化

access

table

emp