I have two tables Institutions and Results and I want to see if there are any results for institutions that way I can exclude the ones that don\'t have results.
Can
It depends on your optimizer. I tried the below two in Oracle 10g and 11g. In 10g, the second one was slightly faster. In 11g, they were identical.
However, #1 is really a misuse of the EXISTS clause. Use joins to find matches.
select *
from
table_one t1
where exists (
select *
from table_two t2
where t2.id_field = t1.id_field
)
order by t1.id_field desc
select t1.*
from
table_one t1
,table_two t2
where t1.id_field = t2.id_field
order by t1.id_field desc
If you're referring to using a left (or right) outer join or a not exists subquery, I'm fairly certain the left outer join wins performance-wise. For example:
SELECT t1.* FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id WHERE t2.id IS NULL
The above should be quicker than the equivalent sub-query, and if you're referring specifically to exists - well, where structure allows, an inner join will always be the preferred option.
Are you using EXISTS as part of a correlated subquery? If so, the join will almost always be faster.
Your database should have ways of benchmarking queries. Use them to see which query runs faster.
If you want the institutions that did not have results, then a 'Where Not Exists' subquery will be faster, as it will stop as soon as it finds a single result for those that have results...
If you want the institutions With results, but you don't actually want the results, same thing. Use a 'Where Exists' subquery.. It will stop as soon as it finds a single result... This also ensures that the result set will only have one record per institution, whereas if you had an institution with multiple results, using the join approach would require that you add the 'distinct' keyword or a 'Group By' clause to eliminate the duplicate cartesion product rows that would be prodcued from the multiple Result records that matched to a single insitution.
If you need the Results, then do a JOIN - An Inner Join if you don't want to see the insitutions without results, and an outer join if you want to see ALL institutions, including the ones with no Results.
Actually, from your vague description of the problem, it sounds to me like a NOT IN query is the most obvious way to code it:
SELECT *
FROM Institutions
WHERE InstitutionID NOT IN (
SELECT DISTINCT InstitutionID
FROM Results
)
Depending on the statement, statistics and DB server it may make no difference - the same optimised query plan may be produced.
There are basically 3 ways that DBs join tables under the hood:
Nested loop - for one table much bigger than the second. Every row in the smaller table is checked for every row in the larger.
Merge - for two tables in the same sort order. Both are run through in order and matched up where they correspond.
Hash - everything else. Temporary tables are used to build up the matches.
By using exists you may effectively force the query plan to do a nested loop. This may be the quickest way, but really you want the query planner to decide.
I would say that you need to write both SQL statements and compare the query plans. You may find that they change quite a bit depending on what data you have.
For instance if [Institutions] and [Results] are similar sizes and both are clustered on InstitutionID a merge join would be quickest. If [Results] is much bigger than [Institutions] a nested loop may be quicker.