问题
I know that many threads has been created here & on the internet about this topic. But I really can't get the final point on the difference between the two statements! I mean, trying and trying I can reach all the results I need with my queries, but I really don't have full control of the knife!
I'm considering myself a very good programmer and a very good SQL-ista and I feel a little ashamed about this...
Here's an example:
- I have a table with the pages of a website ("web_page")
- a table with the categories ("category").
- a category can contain one or more pages, but not vice versa
- a category may contain NO pages at all
- a page can be visible or not in the website
So if I want to show all the categories and their pages, I mean both categories with pages and without, I have to do something like this:
FROM category
LEFT JOIN web_page ON ( web_page.category_id = category.category_id AND web_page.active = "Y" )
So if a category has no pages, I'll see web_page_id NULL on the record of that category.
But if I do:
FROM category
LEFT JOIN web_page ON ( web_page.category_id = category.category_id )
...
WHERE web_page.active = "Y"...
I'll select only the categories that have at least one web_page... But WHY?
This was just an example... I'd like to understand once forever this difference!
Thank you.
回答1:
To make your query to work as you intended, put the condition into the ON
clause:
FROM category
LEFT JOIN web_page ON web_page.category_id = category
and web_page.active = "Y"
The reason this works is (with most databases, but not all) the WHERE
clause filters the rows After they are joined. If the join doesn't result in a web page row joining (because the category had no web pages), then all the columns of web page will be null
, and any comparison of a value (like "Y"
) to a null
is false, so those non-joining rows will be filtered out.
However, by moving the condition into the ON
clause, the condition is executed as the join is made, so that you only join rows that are active = "Y"
, but if there aren't any such rows, you'll just get the left join null
web page.
This version of the query is really saying: "give me all categories and their active web pages (if any)"
Note that I said "most databases"... mysql for example is smart enough to understand what you are trying to do, and your query will work as you intended if run on mysql.
回答2:
This happens because SQL is being processed in stages:
FROM
clause (all joins are here);WHERE
clause;GROUP BY
clause;- Window functions (not related to the MySQL though);
ORDER BY
clause.
So this really is important, if you want to filter on web_page.active='Y'
or you want to join with the same condition. In the former case, join is done and you just filter out the results, converting you OUTER
join into the INNER
one. In the latter case, you will achieve the desired result, as non-matching rows will result in NULL
values for the corresponding columns.
回答3:
Trying to help you on the understanding part.
Consider is a facet of the SQL Language that when the criteria is specified within the LEFT JOIN
, it applies when finding the records to match against.
When the criteria is specified in the WHERE clause at the bottom, it applies to all of the records - after the join has occured. This has an unintended side effect of changing the LEFT JOIN
to an INNER JOIN
as you have seen.
You can get around this by doing a WHERE
clause like this:
WHERE COALESCE(web_page.active,"Y") = "Y"
But that isn't actaully guarenteed to be the same results, so the proper way to do it is to keep that criteria in the ON
clauses of the JOIN
.
来源:https://stackoverflow.com/questions/11329683/left-and-inner-join-difference-once-forever