Adding statsmodels 'predict' results to a Pandas dataframe

大憨熊 提交于 2019-12-17 20:28:13

问题


It is common to want to append the results of predictions to the dataset used to make the predictions, but the statsmodels predict function returns (non-indexed) results of a potentially different length than the dataset on which predictions are based.

For example, if the test dataset, test, contains any null entries, then

mod_fit = sm.Logit.from_formula('Y ~ A B C', train).fit()
press = mod_fit.predict(test)

will produce an array that is shorter than the length of test, and cannot be usefully appended with

test['preds'] = preds

And since the result of predict is not indexed, there is no way to recover the rows to which the results should be attached.

What is the idiom for associating predict results to the rows from which they were generated? Is there, perhaps, a way to get predict to return a dataframe that preserves the indices of its argument?


回答1:


Predict shouldn't drop any rows. Can you post a minimal working example where this happens? Preserving the pandas index is on my radar and should be fixed in master soon.

https://github.com/statsmodels/statsmodels/issues/1501

Edit: Nevermind. This is a known issue. https://github.com/statsmodels/statsmodels/issues/1352



来源:https://stackoverflow.com/questions/22580477/adding-statsmodels-predict-results-to-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!