I\'m trying to find the most efficient way of dealing with this but I must tell you front-head I\'ve made a mess of it. Looked around SO and found nothing of relevance so he
Something like this... ?
SELECT *
FROM projects AS L
WHERE
EXISTS (
SELECT 1
FROM
projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE
L.num = PT.project_id
AND PT2.project_id = 4
AND PT2.project_id <> L.num
)
That's 2 seeks and a scan.
UPDATE
Taking a page from jdelard's book, one tiny modification switches my query to outperform his (of course I'm doing this on SQL Server meaning I took out his GROUP BY and put in a DISTINCT, so YMMV on MySQL):
SELECT *
FROM projects AS L
WHERE
L.num != 4 -- instead of <> PT2.project_id inside
AND EXISTS (
SELECT 1
FROM
projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE
L.num = PT.project_id
AND PT2.project_id = 4
)
The improvement over his query comes from not doing a DISTINCT or aggregate, and using a semi join instead of a complete join so not every row has to be joined. Otherwise, semantically they are largely the same.
I will have to remember jdelard's trick as it is a very useful tool. For some reason the query engine was not smart enough to compute that given {a = 4, a != b} then {b != 4}.