Two selects or one select + one join in SQL?

问题

The following code snippets should do the same work.

SELECT t1.* FROM table1 t1
INNER JOIN table2 t2
ON t1.ID = t2.IDService
WHERE t2.Code = @code

and

SELECT * FROM table1 t1
WHERE t1.ID IN (SELECT IDService FROM table2 WHERE Code = @code)

Which one is the best solution, in general? And computationally, is better have two nested select or is better use inner join?

EDIT: Consider that the PK of table1 is ID and the PK of table2 id the couple (IDService,Code). So, fixing the Code (using WHERE clause) and applying the clause ON to IDService, I can assume that the result of each select are the same.

回答1:

Your notion that they should do the same work is not true. Imagine this test set of data:

ID
----
1
2
3
4
5

ID
---
1
1
1
2
2
3

DDL

CREATE TABLE dbo.T1 (ID INT NOT NULL);
INSERT dbo.T1 (ID) VALUES (1), (2), (3), (4), (5);

CREATE TABLE dbo.T2 (ID INT NOT NULL);
INSERT dbo.T2 (ID) VALUES (1), (1), (1), (2), (2), (3);

SELECT  *
FROM    dbo.T1
WHERE   T1.ID IN (SELECT T2.ID FROM dbo.T2);

SELECT  T1.*
FROM    dbo.T1
        INNER JOIN dbo.T2
            ON T1.ID = T2.ID;

Results

ID
---
1
2
3

ID
---
1
1
1
2
2
3

Your results are only the same if the column you are searching in is unique.

CREATE TABLE dbo.T1 (ID INT NOT NULL);
INSERT dbo.T1 (ID) VALUES (1), (2), (3), (4), (5);

CREATE TABLE dbo.T2 (ID INT NOT NULL);
INSERT dbo.T2 (ID) VALUES (1), (2), (3);

SELECT  *
FROM    dbo.T1
WHERE   T1.ID IN (SELECT T2.ID FROM dbo.T2);

SELECT  T1.*
FROM    dbo.T1
        INNER JOIN dbo.T2
            ON T1.ID = T2.ID;

Even though the results are the same, the execution plan isn't. The first query using IN is able to use an anti-semi join, meaning it knows that the data in t2 is not needed, so as soon as it finds a single match it can stop scanning for further matches.

If you constrain your second table to have only unique values, then you will see the same plan:

CREATE TABLE dbo.T1 (ID INT NOT NULL PRIMARY KEY);
INSERT dbo.T1 (ID) VALUES (1), (2), (3), (4), (5);

CREATE TABLE dbo.T2 (ID INT NOT NULL PRIMARY KEY);
INSERT dbo.T2 (ID) VALUES (1), (2), (3);

SELECT  *
FROM    dbo.T1
WHERE   T1.ID IN (SELECT T2.ID FROM dbo.T2);

SELECT  T1.*
FROM    dbo.T1
        INNER JOIN dbo.T2
            ON T1.ID = T2.ID;

In summary, the two queries will not always yield the same results, and they will not always have the same plan. It really depends on your Indexes and the width of your data/query.

回答2:

As a very general rule a JOIN almost always performs better than a SUB-QUERY, there are exceptions.

If you are to use a subquery, an EXIST clause, usually performs better than IN for the majority of use cases.

For Test Cases, you can look into this site

You Should Go with the following rule...

If you need data that is from more than one table, then you can always go with join.
You can use subqueries if you require more than one query and each subquery provides a subset of the table involved in the query.
If the query requires a NOT EXISTS condition, then you must use a subquery because NOT EXISTS operates only in a subquery; the same principle holds true for the EXISTS condition.
PROCEDURE SQL query optimizer changes some subqueries to joins, a join is generally more efficient to process.

来源：https://stackoverflow.com/questions/19835485/two-selects-or-one-select-one-join-in-sql

标签

sql

sql-server-2008

join

inner-join