add random forest predictions as column into test file

[亡魂溺海] 提交于 2020-05-08 14:33:47

问题


I am working in python pandas (in a Jupyter notebook), where I created a Random Forest model for the Titanic data set. https://www.kaggle.com/c/titanic/data

I read in the test and train data, then I clean it and I add new columns (the same columns to both).

After fitting and re-fitting the model and trying boosts etc; I decide on one model:

 X2 = train_data[['Pclass','Sex','Age','richness']] 
 rfc_model_3 = RandomForestClassifier(n_estimators=200)
 %time cross_val_score(rfc_model_3, X2, Y_target).mean()
 rfc_model_3.fit(X2, Y_target)

Then I predict, if somebody survived or not

 X_test = test_data[['Pclass','Sex','Age','richness']]
 predictions = rfc_model_3.predict(X_test)
 preds = pd.DataFrame(predictions, columns=['Survived'])

Is there a way for me to add the predictions as a column into the test file?


回答1:


Since

rfc_model_3 = RandomForestClassifier(n_estimators=200)
rfc_model_3.predict(X_test)

returns y : array of shape = [n_samples] (see docs), you should be able to add the model output directly to X_test without creating an intermediate DataFrame:

X_test['survived'] = rfc_model_3.predict(X_test)

If you want the intermediate result anyway, @EdChum's suggestion in the comments would work fine.



来源:https://stackoverflow.com/questions/37084800/add-random-forest-predictions-as-column-into-test-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!