In the context of unit testing some functions, I\'m trying to establish the equality of 2 DataFrames using python pandas:
ipdb> expect
Any equality comparison using == with np.NaN is False, even np.NaN == np.NaN is False.
Simply, df1.fillna('NULL') == df2.fillna('NULL'), if 'NULL' is not a value in the original data.
To be safe, do the following:
Example a) Compare two dataframes with NaN values
bools = (df1 == df2)
bools[pd.isnull(df1) & pd.isnull(df2)] = True
assert bools.all().all()
Example b) Filter rows in df1 that do not match with df2
bools = (df1 != df2)
bools[pd.isnull(df1) & pd.isnull(df2)] = False
df_outlier = df1[bools.all(axis=1)]
(Note: this is wrong - bools[pd.isnull(df1) == pd.isnull(df2)] = False)