I have two separate pandas dataframes (df1
and df2
) which have multiple columns, but only one in common (\'text\').
I would like to do fin
You can merge them and keep only the lines that have a NaN.
df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]
or you can use isin
:
df2[~df2.text.isin(df1.text)]
As you asked, you can do this efficiently using isin
(without resorting to expensive merge
s).
>>> df2[~df2.text.isin(df1.text.values)]
C D text
0 0.5 2 shot
1 0.3 2 shot
EDIT:
import numpy as np
mergeddf = pd.merge(df2,df1, how="left")
result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]