问题
I have two datasets:
df1 = pd.DataFrame(data = {'label1': ['A', 'A', 'B', 'C'], 'label2': ['a', 'b', 'c', 'd'], 'value': [1,2,3,4]})
df2 = pd.DataFrame(data = {'label1': ['A', 'A', 'D', 'E'], 'label'2': ['a', 'd', 'c','e'], 'value2': [10,12,23,14]})
I would like to perform an anti-join so that the resulting data frame contains the rows of df1 where the key [['label1', 'label2']] is not found in df2.
The resulting df should be:
label1 label2 value
A b 2
B c 3
C d 4
In R using dplyr, the code would be:
df3 = anti_join(df1, df2, by = c("label1", "label2"))
Thanks for your help.
回答1:
Using isin
with tuple
df1[~df1[['label1','label2']].apply(tuple,1).isin(df2[['label1','label2']].apply(tuple,1))]
Out[140]:
label1 label2 value
1 A b 2
2 B c 3
3 C d 4
回答2:
Option1
Simply perform an inner join and remove the intersecting rows from df1
.
df1.drop(df1.merge(df2).index)
Option2:
You need to do a left join and see how many of the rows shows up as null for the column from df2
.
mer = df1.merge(df2, how='left')
mer[mer['value2'].isnull()].drop(['value2'], 1)
Output:
label1 label2 value
1 A b 2
2 B c 3
3 C d 4
来源:https://stackoverflow.com/questions/51215628/how-to-perform-an-anti-join-or-left-outer-join-get-all-the-rows-in-a-dataset