I\'ve got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to
To pull out the symmetric differences:
df_diff = pd.concat([df1,df2]).drop_duplicates(keep=False)
For example:
df1 = pd.DataFrame({
'num': [1, 4, 3],
'name': ['a', 'b', 'c'],
})
df2 = pd.DataFrame({
'num': [1, 2, 3],
'name': ['a', 'b', 'd'],
})
Will yield:
Note: until the next release of pandas, to avoid the warning about how the sort argument will be set in the future, just add the sort=False
argument. As below:
df_diff = pd.concat([df1,df2], sort=False).drop_duplicates(keep=False)