I am an old-school MySQL user and have always preferred JOIN over sub-query. But nowadays everyone uses sub-query, and I hate it; I don\'t know why.
Subqueries have ability to calculate aggregation functions on a fly. E.g. Find minimal price of the book and get all books which are sold with this price. 1) Using Subqueries:
SELECT titles, price
FROM Books, Orders
WHERE price =
(SELECT MIN(price)
FROM Orders) AND (Books.ID=Orders.ID);
2) using JOINs
SELECT MIN(price)
FROM Orders;
-----------------
2.99
SELECT titles, price
FROM Books b
INNER JOIN Orders o
ON b.ID = o.ID
WHERE o.price = 2.99;
MSDN Documentation for SQL Server says
Many Transact-SQL statements that include subqueries can be alternatively formulated as joins. Other questions can be posed only with subqueries. In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance. Otherwise, the nested query must be processed for each result of the outer query to ensure elimination of duplicates. In such cases, a join approach would yield better results.
so if you need something like
select * from t1 where exists select * from t2 where t2.parent=t1.id
try to use join instead. In other cases, it makes no difference.
I say: Creating functions for subqueries eliminate the problem of cluttter and allows you to implement additional logic to subqueries. So I recommend creating functions for subqueries whenever possible.
Clutter in code is a big problem and the industry has been working on avoiding it for decades.
In most cases JOINs are faster than sub-queries and it is very rare for a sub-query to be faster.
In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time, unlike the sub-query where it will run all the queries and load all their data to do the processing.
The good thing in sub-queries is that they are more readable than JOINs: that's why most new SQL people prefer them; it is the easy way; but when it comes to performance, JOINS are better in most cases even though they are not hard to read too.
I just thinking about the same problem, but I am using subquery in the FROM part. I need connect and query from large tables, the "slave" table have 28 million record but the result is only 128 so small result big data! I am using MAX() function on it.
First I am using LEFT JOIN because I think that is the correct way, the mysql can optimalize etc. Second time just for testing, I rewrite to sub-select against the JOIN.
LEFT JOIN runtime: 1.12s SUB-SELECT runtime: 0.06s
18 times faster the subselect than the join! Just in the chokito adv. The subselect looks terrible but the result ...
As per my observation like two cases, if a table has less then 100,000 records then the join will work fast.
But in the case that a table has more than 100,000 records then a subquery is best result.
I have one table that has 500,000 records on that I created below query and its result time is like
SELECT *
FROM crv.workorder_details wd
inner join crv.workorder wr on wr.workorder_id = wd.workorder_id;
Result : 13.3 Seconds
select *
from crv.workorder_details
where workorder_id in (select workorder_id from crv.workorder)
Result : 1.65 Seconds
The difference is only seen when the second joining table has significantly more data than the primary table. I had an experience like below...
We had a users table of one hundred thousand entries and their membership data (friendship) about 3 hundred thousand entries. It was a join statement in order to take friends and their data, but with a great delay. But it was working fine where there was only a small amount of data in the membership table. Once we changed it to use a sub-query it worked fine.
But in the mean time the join queries are working with other tables that have fewer entries than the primary table.
So I think the join and sub query statements are working fine and it depends on the data and the situation.