I have two dataframes of different size (df1 nad df2). I would like to remove from df1 all the rows which are stored within df2<
pandas has a method called isin, however this relies on unique indices. We can define a lambda function to create columns we can use in this from the existing 'A' and 'B' of df1 and df2. We then negate this (as we want the values not in df2) and reset the index:
import pandas as pd
df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
'B' : [ 5, 6, 6, 9, 7, 7, 7, 1],
'C' : ['a' , 's', 'd', 'f', 'g', 'h', 'j', 'k']})
df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
'B' : [ 6, 7]})
unique_ind = lambda df: df['A'].astype(str) + '_' + df['B'].astype(str)
print df1[~unique_ind(df1).isin(unique_ind(df2))].reset_index(drop=True)
printing:
A B C
0 qwe 5 a
1 rty 9 f
2 iop 1 k