Comparing two pandas dataframes for differences

后端 未结 8 1240
感情败类
感情败类 2020-11-30 02:53

I\'ve got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to

8条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-30 03:35

    To pull out the symmetric differences:

    df_diff = pd.concat([df1,df2]).drop_duplicates(keep=False)
    

    For example:

    df1 = pd.DataFrame({
        'num': [1, 4, 3],
        'name': ['a', 'b', 'c'],
    })
    df2 = pd.DataFrame({
        'num': [1, 2, 3],
        'name': ['a', 'b', 'd'],
    })
    

    Will yield:

    Note: until the next release of pandas, to avoid the warning about how the sort argument will be set in the future, just add the sort=False argument. As below:

    df_diff = pd.concat([df1,df2], sort=False).drop_duplicates(keep=False)
    

提交回复
热议问题