pandas cut a series with nan values

你说的曾经没有我的故事 提交于 2020-01-15 06:21:32

问题


I would like to apply the pandas cut function to a series that includes NaNs. The desired behavior is that it buckets the non-NaN elements and returns NaN for the NaN-elements.

import pandas as pd
numbers_with_nan = pd.Series([3,1,2,pd.NaT,3])
numbers_without_nan = numbers_with_nan.dropna()

The cutting works fine for the series without NaNs:

pd.cut(numbers_without_nan, bins=[1,2,3], include_lowest=True)
0      (2.0, 3.0]
1    (0.999, 2.0]
2    (0.999, 2.0]
4      (2.0, 3.0]

When I cut the series that contains NaNs, element 3 is correctly returned as NaN, but the last element gets the wrong bin assigned:

pd.cut(numbers_with_nan, bins=[1,2,3], include_lowest=True)
0      (2.0, 3.0]
1    (0.999, 2.0]
2    (0.999, 2.0]
3             NaN
4    (0.999, 2.0]

How can I get the following output?

0      (2.0, 3.0]
1    (0.999, 2.0]
2    (0.999, 2.0]
3             NaN
4      (2.0, 3.0]

回答1:


This is strange. The problem isn't pd.NaT, it's the fact your series has object dtype instead of a regular numeric series, e.g. float, int.

A quick fix is to replace pd.NaT with np.nan via fillna. This triggers series conversion from object to float64 dtype, and may also lead to better performance.

s = pd.Series([3, 1, 2, pd.NaT, 3])

res = pd.cut(s.fillna(np.nan), bins=[1, 2, 3], include_lowest=True)

print(res)

0    (2, 3]
1    [1, 2]
2    [1, 2]
3       NaN
4    (2, 3]
dtype: category
Categories (2, object): [[1, 2] < (2, 3]]

A more generalized solution is to convert to numeric explicitly beforehand:

s = pd.to_numeric(s, errors='coerce')


来源:https://stackoverflow.com/questions/53080937/pandas-cut-a-series-with-nan-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!