问题
I have heard a lot of people over the years say that:
"join" operators are preferred over “NOT EXISTS”
Why?
回答1:
In MySQL
, Oracle
, SQL Server
and PostgreSQL
, NOT EXISTS
is of the same efficiency or even more efficient than LEFT JOIN / IS NULL
.
While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for NOT EXISTS
and even worse for NOT IN
, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate anti-join
methods.
In SQL Server
, actually, LEFT JOIN / IS NULL
may be less efficient than NOT EXISTS / NOT IN
in case of unindexed or low cardinality column in the inner table.
It is often heard that MySQL
is "especially bad in treating subqueries".
This roots from the fact that MySQL
is not capable of any join methods other than nested loops, which severely limits its optimization abilities.
The only case when a query would benefit from rewriting subquery as a join would be this:
SELECT *
FROM big_table
WHERE big_table_column IN
(
SELECT small_table_column
FROM small_table
)
small_table
will not be queried completely for each record in big_table
: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an EXISTS
(using index_subquery
to search for the first much if needed if small_table_column
is indexed)
But big_table
would always be leading, which makes the query complete in big * LOG(small)
rather than small * LOG(big)
reads.
This could be rewritten as
SELECT DISTINCT bt.*
FROM small_table st
JOIN big_table bt
ON bt.big_table_column = st.small_table_column
However, this won't improve NOT IN
(as opposed to IN
). In MySQL
, NOT EXISTS
and LEFT JOIN / IS NULL
are almost the same, since with nested loops the left table should always be leading in a LEFT JOIN
.
You may want to read these articles:
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
- IN vs. JOIN vs. EXISTS: Oracle
- IN vs. JOIN vs. EXISTS (SQL Server)
回答2:
It may have to do with the optimization process... NOT EXISTS implies a subquery, and "optimizers" usually don't do subqueries justice. On the other hand, joins can be dealt with more easily...
回答3:
I think this is a MySQL specific case. MySQL do not optimize subquery in IN / not in / any / not exists clauses, and actually performs the subquery for each row matched by the outer query. Because of this in MySQL, you should use join. In PostgreSQL however, you can just use subquery.
来源:https://stackoverflow.com/questions/6777347/is-using-not-exists-considered-to-be-bad-sql-practise