I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied w
As far as I know SQL dialects that support SEMIJOIN/ANTISEMI are U-SQL/Cloudera Impala.
SEMIJOIN:
Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern.
More info Semi Join and Anti Join Should Have Their Own Syntax in SQL:
“Semi” means that we don’t really join the right hand side, we only check if a join would yield results for any given tuple.
-- IN
SELECT *
FROM Employee
WHERE DeptName IN (
SELECT DeptName
FROM Dept
)
-- EXISTS
SELECT *
FROM Employee
WHERE EXISTS (
SELECT 1
FROM Dept
WHERE Employee.DeptName = Dept.DeptName
)
EDIT:
Another dialect that supports SEMI/ANTISEMI join is KQL:
kind=leftsemi (or kind=rightsemi)
Returns all the records from the left side that have matches from the right. The result table contains columns from the left side only.
let t1 = datatable(key:long, value:string)
[1, "a",
2, "b",
3, "c"];
let t2 = datatable(key:long)
[1,3];
t1 | join kind=leftsemi (t2) on key
demo
Output:
key value
1 a
3 c