I\'m curious about how the execution of EXISTS()
is supposed to be faster than IN()
.
I was answering a question when Bill Karwin brought up
This depends on the MySQL version - there is a bug in the MySQL query optimizer in versions up to 6.0.
Subqueries with "IN" were not optimized correctly (but executed again and again like dependant ones). This bug does not affect exists
queries or joins.
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.
Refs.