Pandas DataFrames with NaNs equality comparison

前端 未结 5 1909
没有蜡笔的小新
没有蜡笔的小新 2020-11-29 07:26

In the context of unit testing some functions, I\'m trying to establish the equality of 2 DataFrames using python pandas:

ipdb> expect
                            


        
5条回答
  •  再見小時候
    2020-11-29 07:54

    Any equality comparison using == with np.NaN is False, even np.NaN == np.NaN is False.

    Simply, df1.fillna('NULL') == df2.fillna('NULL'), if 'NULL' is not a value in the original data.

    To be safe, do the following:

    Example a) Compare two dataframes with NaN values

    bools = (df1 == df2)
    bools[pd.isnull(df1) & pd.isnull(df2)] = True
    assert bools.all().all()
    

    Example b) Filter rows in df1 that do not match with df2

    bools = (df1 != df2)
    bools[pd.isnull(df1) & pd.isnull(df2)] = False
    df_outlier = df1[bools.all(axis=1)]
    

    (Note: this is wrong - bools[pd.isnull(df1) == pd.isnull(df2)] = False)

提交回复
热议问题