Is using “NOT EXISTS” considered to be bad SQL practise?

孤人 提交于 2019-12-18 13:33:26

问题


I have heard a lot of people over the years say that:

"join" operators are preferred over “NOT EXISTS”

Why?


回答1:


In MySQL, Oracle, SQL Server and PostgreSQL, NOT EXISTS is of the same efficiency or even more efficient than LEFT JOIN / IS NULL.

While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for NOT EXISTS and even worse for NOT IN, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate anti-join methods.

In SQL Server, actually, LEFT JOIN / IS NULL may be less efficient than NOT EXISTS / NOT IN in case of unindexed or low cardinality column in the inner table.

It is often heard that MySQL is "especially bad in treating subqueries".

This roots from the fact that MySQL is not capable of any join methods other than nested loops, which severely limits its optimization abilities.

The only case when a query would benefit from rewriting subquery as a join would be this:

SELECT  *
FROM    big_table
WHERE   big_table_column IN
        (
        SELECT  small_table_column
        FROM    small_table
        )

small_table will not be queried completely for each record in big_table: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an EXISTS (using index_subquery to search for the first much if needed if small_table_column is indexed)

But big_table would always be leading, which makes the query complete in big * LOG(small) rather than small * LOG(big) reads.

This could be rewritten as

SELECT  DISTINCT bt.*
FROM    small_table st
JOIN    big_table bt
ON      bt.big_table_column = st.small_table_column

However, this won't improve NOT IN (as opposed to IN). In MySQL, NOT EXISTS and LEFT JOIN / IS NULL are almost the same, since with nested loops the left table should always be leading in a LEFT JOIN.

You may want to read these articles:

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
  • IN vs. JOIN vs. EXISTS: Oracle
  • IN vs. JOIN vs. EXISTS (SQL Server)



回答2:


It may have to do with the optimization process... NOT EXISTS implies a subquery, and "optimizers" usually don't do subqueries justice. On the other hand, joins can be dealt with more easily...




回答3:


I think this is a MySQL specific case. MySQL do not optimize subquery in IN / not in / any / not exists clauses, and actually performs the subquery for each row matched by the outer query. Because of this in MySQL, you should use join. In PostgreSQL however, you can just use subquery.



来源:https://stackoverflow.com/questions/6777347/is-using-not-exists-considered-to-be-bad-sql-practise

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!