Why is my SQL 'NOT IN' clause producing different results from 'NOT EXISTS'

问题

I have two SQL queries producing different results when I would expect them to produce the same result. I am trying to find the number of events that do not have a corresponding location. All locations have an event but events can also link to non-location records.

The following query produces a count of 16244, the correct value.

SELECT COUNT(DISTINCT e.event_id)   
FROM   events AS e   
WHERE  NOT EXISTS   
  (SELECT * FROM locations AS l WHERE l.event_id = e.event_id)

The following query produces a count of 0.

SELECT COUNT(DISTINCT e.event_id) 
FROM   events AS e
WHERE  e.event_id NOT IN (SELECT  l.event_id FROM locations AS l)

The following SQL does some summaries of the data set

SELECT  'Event Count', 
        COUNT(DISTINCT event_id) 
        FROM events

UNION ALL

SELECT  'Locations Count', 
        COUNT(DISTINCT event_id) 
        FROM locations

UNION ALL

SELECT  'Event+Location Count', 
        COUNT(DISTINCT l.event_id) 
        FROM locations AS l  JOIN events AS e ON l.event_Id = e.event_id

And returns the following results

Event Count         139599
Locations Count         123355
Event+Location Count    123355

Can anyone shed any light on why the 2 initial queries do not produce the same figure.

回答1:

You have a NULL in the subquery SELECT l.event_id FROM locations AS l so NOT IN will always evaluate to unknown and return 0 results

SELECT COUNT(DISTINCT e.event_id) 
FROM   events AS e
WHERE  e.event_id NOT IN (SELECT  l.event_id FROM locations AS l)

The reason for this behaviour can be seen from the below example.

'x' NOT IN (NULL,'a','b')

≡ 'x' <> NULL and 'x' <> 'a' and 'x' <> 'b'

≡ Unknown and True and True

≡ Unknown

回答2:

The NOT IN form works differently for NULLs. The presence of a single NULL will cause the entire statement to fail, thus returning no results.

So you have at least one event_id in locations that is NULL.

Also, your query might be better written as a join:

SELECT 
    COUNT(DISTINCT e.event_id)    
FROM
    events AS e  
    LEFT JOIN locations AS l ON e.event_id = l.event_id
WHERE
    l.event_id IS NULL

[UPDATE: apparently, the NOT EXISTS version is faster.]

回答3:

In and Exists are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table -- typically.

As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index on T1(x).

来源：https://stackoverflow.com/questions/2973547/why-is-my-sql-not-in-clause-producing-different-results-from-not-exists

标签

sql

sql-server-2005