Python pandas: how to remove nan and -inf values

匿名 (未验证) 提交于 2019-12-03 07:50:05

问题:

I have the following dataframe

           time       X    Y  X_t0     X_tp0  X_t1     X_tp1  X_t2     X_tp2 0         0.002876    0   10     0       NaN   NaN       NaN   NaN       NaN 1         0.002986    0   10     0       NaN     0       NaN   NaN       NaN 2         0.037367    1   10     1  1.000000     0       NaN     0       NaN 3         0.037374    2   10     2  0.500000     1  1.000000     0       NaN 4         0.037389    3   10     3  0.333333     2  0.500000     1  1.000000 5         0.037393    4   10     4  0.250000     3  0.333333     2  0.500000  .... 1030308   9.962213  256  268   256  0.000000   256  0.003906   255  0.003922 1030309  10.041799    0  268     0      -inf   256  0.000000   256  0.003906 1030310  10.118960    0  268     0       NaN     0      -inf   256  0.000000 

I tried with the following

df.dropna(inplace=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)  X_train = X_train.drop('time', axis=1) X_train = X_train.drop('X_t1', axis=1) X_train = X_train.drop('X_t2', axis=1) X_test = X_test.drop('time', axis=1) X_test = X_test.drop('X_t1', axis=1) X_test = X_test.drop('X_t2', axis=1) X_test.fillna(X_test.mean(), inplace=True) X_train.fillna(X_train.mean(), inplace=True) y_train.fillna(y_train.mean(), inplace=True) 

However, I am still getting this error ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). whenever i try to fit a regression model fit(X_train, y_train)

How can we remove both the NaN and -inf values at the same time?

回答1:

Use pd.DataFrame.isin and check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe.

df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]               time    X    Y  X_t0     X_tp0   X_t1     X_tp1   X_t2     X_tp2 4        0.037389    3   10     3  0.333333    2.0  0.500000    1.0  1.000000 5        0.037393    4   10     4  0.250000    3.0  0.333333    2.0  0.500000 1030308  9.962213  256  268   256  0.000000  256.0  0.003906  255.0  0.003922 


回答2:

You can replace inf and -inf with NaN, and then select non-null rows.

df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)]  # .astype(np.float64) ? 

or

df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) 

Check the type of your columns returns to make sure they are all as expected (e.g. np.float32/64) via df.info().



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!