Pandas: Diff of two Dataframes

老子叫甜甜 提交于 2019-11-27 11:40:41

merge the 2 dfs using method 'outer' and pass param indicator=True this will tell you whether the rows are present in both/left only/right only, you can then filter the merged df after:

In [22]:
merged = df1.merge(df2, indicator=True, how='outer')
merged[merged['_merge'] == 'right_only']

Out[22]:
  Buyer  Quantity      _merge
3  Carl         2  right_only
4  Mark         1  right_only
diff = set(zip(df2.Buyer, df2.Quantity)) - set(zip(df1.Buyer, df1.Quantity))

This is the first solution that came to mind. You can then put the diff set back in a DF for presentation.

Try the following if you only care about adding the new Buyers to the other df:

df_delta=df2[df2['Buyer'].apply(lambda x: x not in df1['Buyer'].values)]

you may find this as the best:

df2[ ~df2.isin(df1)].dropna()

@EdChum's answer is self-explained. But using not 'both' condition makes more sense and you do not need to care about the order of comparison, and this is what a real diff supposed to be. For the sake of answering your question:

merged = df1.merge(df2, indicator=True, how='outer')
merged.loc = [merged['_merge'] != 'both']
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!