Comparing two pandas dataframes for differences

后端未结

关注

 8  1267

感情败类 2020-11-30 02:53

I\'ve got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to

8条回答

小蘑菇 (楼主)

2020-11-30 03:35
To pull out the symmetric differences:
```
df_diff = pd.concat([df1,df2]).drop_duplicates(keep=False)
```
For example:
```
df1 = pd.DataFrame({
    'num': [1, 4, 3],
    'name': ['a', 'b', 'c'],
})
df2 = pd.DataFrame({
    'num': [1, 2, 3],
    'name': ['a', 'b', 'd'],
})
```
Will yield:

Note: until the next release of pandas, to avoid the warning about how the sort argument will be set in the future, just add the sort=False argument. As below:
```
df_diff = pd.concat([df1,df2], sort=False).drop_duplicates(keep=False)
```
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...