Is it better to use INNER JOIN or EXISTS to find belonging to several in m2m relation?

问题

Given m2m relation: items-categories I have three tables:

items,
categories and
items_categories that hold references to both

I want to find an item belonging to all given category sets:

Find Item 
belonging to a category in [1,3,6] 
and belonging to a category in [7,8,4] 
and belonging to a category in [12,66,42]
and ...

There are two ways I can think of to accomplish this in mySQL.

OPTION A: INNER JOIN:

SELECT id from items 
INNER JOIN category c1 ON (item.id = c1.item_id)
INNER JOIN category c2 ON (item.id = c2.item_id)
INNER JOIN category c3 ON (item.id = c3.item_id)
...
WHERE
c1.category_id IN [1,3,6] AND
c2.category_id IN [7,8,4] AND
c3.category_id IN [12,66,42] AND
...;

OPTION B: EXISTS:

SELECT id from items
WHERE
EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [1,3,6] AND
EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [7,8,4] AND
EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [12,66,42] AND
...;

Both options work. The question is: Which is the fastest / most optimal for large item table? Or is there an OPTION C I am missing?

回答1:

OPTION A

JOIN has an advantage over EXIST , because it will more efficiently use the indices, especially in case of large tables

回答2:

A JOIN is more efficient, generally speaking.

However, one thing to be aware of is that joins can produce duplicate rows in your output. For example, if item id was in category 1 and 3, the first JOIN would result in two rows for id 123. If item id 999 was in categories 1,3,7,8,12, and 66, you would get eight rows for 999 in your results (2*2*2).

Duplicate rows are something you need to be aware of and handle. In this case, you could just use select distinct id.... Eliminating duplicates can get more complicated with a complex query, though.

回答3:

You are using Join in Option A and subquery in Option B. The difference is:

In most cases JOINs are faster than sub-queries and it is very rare for a sub-query to be faster.

In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time, unlike the sub-query where it will run all the queries and load all their data to do the processing.

The good thing in sub-queries is that they are more readable than JOINs: that's why most new SQL people prefer them; it is the easy way; but when it comes to performance, JOINS are better in most cases even though they are not hard to read too.

回答4:

 select distinct `user_posts_id` from `user_posts_boxes`
     where `user_id` = 5 
     and 
     exists (select * from `box` where `user_posts_boxes`.`box_id` = `box`.`id` 
     and `status` in ("A","F"))
     order by `user_posts_id` desc limit 200;



 select distinct `user_posts_id` from `user_posts_boxes`
 INNER JOIN box on box.id = `user_posts_boxes`.`box_id` and box.`status` in ("A","F")
 and box.user_id = 5
 order by `user_posts_id` desc limit 200

I tried with both query, But above query works faster for me.Both tables having large dataset. Almost "user_posts_boxes" has 4 million and boxes are 1.5 million.

First query took = 0.147 ms 2nd Query almost = 0.5 to 0.9 MS

But my database tables are inno db and having physical relationships are also applied.

SO I should go for exists but it also depends upon how you have your db structure.

来源：https://stackoverflow.com/questions/13063772/is-it-better-to-use-inner-join-or-exists-to-find-belonging-to-several-in-m2m-rel

标签

mysql

sql

performance

many-to-many