I have two tables Institutions and Results and I want to see if there are any results for institutions that way I can exclude the ones that don\'t have results.
Can
In cases like above the Exists statement works faster than that of Joins. Exists will give you a single record and will save the time also. In case of joins the number of records will be more and all the records must be used.
Whether there's a performance difference or not, you need to use what's more appropriate for your purpose. Your purpose is to get a list of Institutions (not Results - you don't need that extra data). So select Institutions that have no Results... translation - use EXISTS.
If the RESULTS table has more than one row per INSTITUTION
, EXISTS()
has the added benefit of not requiring you to select distinct Institutions.
As for performance, I have seen joins, IN(), and EXISTS()
each be fastest in a variety of uses. To find the best method for your purposes you must test.
I'd say a JOIN is slower, because your query execution stops as soon as an EXISTS call finds something, while a JOIN will continue until the very end.
EDIT: But it depends on the query. This is something that should be judged on a case-by-case basis.
It depends.
Ultimately the 2 serve entirely different purposes.
You JOIN 2 tables to access related records. If you don't need to access the data in the related records then you have no need to join them.
EXISTS can be used to determine if a token exists in a given dataset but won't allow you to access the related records.
Post an example of the 2 methods you have in mind and I might be able to give you a better idea.
With your two tables Institutions and Results if you want a list of institutions that have results, this query will be most efficient:
select Institutions.institution_name
from Institutions
inner join Results on (Institutions.institution_id = Results.institution_id)
If you have an institution_id and just want to know if it has results, using EXISTS might be faster:
if exists(select 1 from Results where institution_id = 2)
print "institution_id 2 has results"
else
print "institution_id 2 does not have results"
A LEFT OUTER JOIN will tend to perform better than a NOT EXISTS**, but in your case you want to do EXISTS and using a simple INNER JOIN doesn't exactly replicate the EXISTS behavior. If you have multiple Results for an Institution, doing the INNER JOIN will return multiple rows for that institution. You could get around that by using DISTINCT, but then the EXISTS will probably be better for performance anyway.
** For those not familiar with this method:
SELECT
MyTable.MyTableID
FROM
dbo.MyTable T1
LEFT OUTER JOIN dbo.MyOtherTable T2 ON
T2.MyTableID = T1.MyTableID
WHERE
T2.MyOtherTableID IS NULL
is equivalent to
SELECT
MyTable.MyTableID
FROM
dbo.MyTable T1
WHERE NOT EXISTS (SELECT * FROM MyOtherTable T2 WHERE T2.MyTableID = T1.MyTableID)
assuming that MyOtherTableID is a NOT NULL column. The first method generally performs faster than the NOT EXISTS method though.