Can I use pandas.dataframe.isin() with a numeric tolerance parameter?

℡╲_俬逩灬. 提交于 2019-11-30 19:43:31

You can do a similar thing with numpy's isclose:

df[np.isclose(df['A'].values[:, None], [3, 6], atol=.5).any(axis=1)]
Out: 
     A    B
1  6.0  2.0
2  3.3  3.2

np.isclose returns this:

np.isclose(df['A'].values[:, None], [3, 6], atol=.5)
Out: 
array([[False, False],
       [False,  True],
       [ True, False],
       [False, False]], dtype=bool)

It is a pairwise comparison of df['A']'s elements and [3, 6] (that's why we needed df['A'].values[: None] - for broadcasting). Since you are looking for whether it is close to any one of them in the list, we call .any(axis=1) at the end.


For multiple columns, change the slice a little bit:

mask = np.isclose(df[['A', 'B']].values[:, :, None], [3, 6], atol=0.5).any(axis=(1, 2))
mask
Out: array([False,  True,  True, False], dtype=bool)

You can use this mask to slice the DataFrame (i.e. df[mask])


If you want to compare df['A'] and df['B'] (and possible other columns) with different vectors, you can create two different masks:

mask1 = np.isclose(df['A'].values[:, None], [1, 2, 3], atol=.5).any(axis=1)
mask2 = np.isclose(df['B'].values[:, None], [4, 5], atol=.5).any(axis=1)
mask3 = ...

Then slice:

df[mask1 & mask2]  # or df[mask1 & mask2 & mask3 & ...]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!