I have a very narrow table: user_id, ancestry.
The user_id column is self explanatory.
The ancestry column contains the country from where the user\'s ancest
This question is some years old but i came via a duplicate to it. I want to suggest a more general solution too. If you know you always have a fixed number of ancestors you can use some self joins as already suggested in the answers. If you want a generic approach go on reading.
What you need here is called Quotient in relational Algebra. The Quotient is more or less the reversal of the Cartesian Product (or Cross Join in SQL).
Let's say your ancestor set A is (i use a table notation here, i think this is better for understanding)
ancestry
-----------
'England'
'France'
'Germany'
and your user set U is
user_id
--------
1
2
3
The cartesian product C=AxU is then:
user_id | ancestry
---------+-----------
1 | 'England'
1 | 'France'
1 | 'Germany'
2 | 'England'
2 | 'France'
2 | 'Germany'
3 | 'England'
3 | 'France'
3 | 'Germany'
If you calculate the set quotient U=C/A then you get
user_id
--------
1
2
3
If you redo the cartesian product UXA you will get C again. But note that for a set T, (T/A)xA will not necessarily reproduce T. For example, if T is
user_id | ancestry
---------+-----------
1 | 'England'
1 | 'France'
1 | 'Germany'
2 | 'England'
2 | 'France'
then (T/A) is
user_id
--------
1
(T/A)xA will then be
user_id | ancestry
---------+------------
1 | 'England'
1 | 'France'
1 | 'Germany'
Note that the records for user_id=2 have been eliminated by the Quotient and Cartesian Product operations.
Your question is: Which user_id has ancestors from all countries in your ancestor set? In other words you want U=T/A where T is your original set (or your table).
To implement the quotient in SQL you have to do 4 steps:
So let's do it step by step. I will use TSQL syntax (Microsoft SQL server) but it should easily be adaptable to other DBMS. As a name for the table (user_id, ancestry) i choose ancestor
CREATE TABLE ancestry_set (ancestry nvarchar(25))
INSERT INTO ancestry_set (ancestry) VALUES ('England')
INSERT INTO ancestry_set (ancestry) VALUES ('France')
INSERT INTO ancestry_set (ancestry) VALUES ('Germany')
CREATE TABLE ancestor ([user_id] int, ancestry nvarchar(25))
INSERT INTO ancestor ([user_id],ancestry) VALUES (1,'England')
INSERT INTO ancestor ([user_id],ancestry) VALUES(1,'Ireland')
INSERT INTO ancestor ([user_id],ancestry) VALUES(2,'France')
INSERT INTO ancestor ([user_id],ancestry) VALUES(3,'Germany')
INSERT INTO ancestor ([user_id],ancestry) VALUES(3,'Poland')
INSERT INTO ancestor ([user_id],ancestry) VALUES(4,'England')
INSERT INTO ancestor ([user_id],ancestry) VALUES(4,'France')
INSERT INTO ancestor ([user_id],ancestry) VALUES(4,'Germany')
INSERT INTO ancestor ([user_id],ancestry) VALUES(5,'France')
INSERT INTO ancestor ([user_id],ancestry) VALUES(5,'Germany')
1) Create the Cartesian Product of your ancestry set and the set of all user_ids.
SELECT a.[user_id],s.ancestry
FROM ancestor a, ancestry_set s
GROUP BY a.[user_id],s.ancestry
2) Find all records in the Cartesian Product which have no partner in the original set (Left Join) and
3) Extract the user_ids from the resultset of 2)
SELECT DISTINCT cp.[user_id]
FROM (SELECT a.[user_id],s.ancestry
FROM ancestor a, ancestry_set s
GROUP BY a.[user_id],s.ancestry) cp
LEFT JOIN ancestor a ON cp.[user_id]=a.[user_id] AND cp.ancestry=a.ancestry
WHERE a.[user_id] is null
4) Return all user_ids from the original set which are not included in the result set of 3)
SELECT DISTINCT [user_id]
FROM ancestor
WHERE [user_id] NOT IN (
SELECT DISTINCT cp.[user_id]
FROM (SELECT a.[user_id],s.ancestry
FROM ancestor a, ancestry_set s
GROUP BY a.[user_id],s.ancestry) cp
LEFT JOIN ancestor a ON cp.[user_id]=a.[user_id] AND cp.ancestry=a.ancestry
WHERE a.[user_id] is null
)