statsmodel.api.Logit: valueerror array must not contain infs or nans

≯℡__Kan透↙ 提交于 2019-12-11 01:48:52

问题


I am trying to apply Logistic Regression in Python using statsmodel.api.Logit. I am running into the error ValueError: array must not contain infs or NaNs.

When I am executing with:

data['intercept'] = 1.0
train_cols = data.columns[1:]
logit = sm.Logit(data['admit'], data[train_cols])
result = logit.fit(start_params=None, method='bfgs', maxiter=20, full_output=1, disp=1, callback=None)

The data contains more than 15000 columns and 2000 rows. which data['admit'] is the target value and data[train_cols] is the list of features. Can anyone please give me some hints to fix this problem?


回答1:


By default, Logit does not check your data for un-processable infinitities (np.inf) or NaNs (np.nan). In pandas, the latter normally signifies a missing entry.

To ignore rows with missing data and proceed with the rest, use missing='drop' like so:

sm.Logit(data['admit'], data[train_cols], missing='drop')

See the Logit docs for other options.

If you do not expect your data to contain any missing entries or infinities, perhaps you loaded it incorrectly. Look at data[data.isnull()] to see where the problem is. (N.B. Read this to see how to make infs register as null.)



来源:https://stackoverflow.com/questions/19223408/statsmodel-api-logit-valueerror-array-must-not-contain-infs-or-nans

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!