how to make RandomForestClassifier faster?

有些话、适合烂在心里 提交于 2019-12-23 05:09:07

问题


I am trying to implement bag of word model from kaggle site with a twitter sentiments data which has around 1M raw. I already clean it but in last part when I applied my features vectors and sentiments to Random Forest classifier it is taking so much time.here is my code...

from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 100,verbose=3)
forest = forest.fit( train_data_features, train["Sentiment"] )

train_data_features is 1048575x5000 sparse matrix.I tried to converted it into an array while doing it indicates a memory error.

Where am I doing wrong?Can some suggest me some source or another way to do it faster?I absolutely novice in machine learning and not have that much programming background so some guide will accommodate.

Much thanks to you in advance


回答1:


Actually the solution is pretty straight forward: get strong machine and run it in parallel. By default RandomForestClassifier uses a single thread, but since it is an ensemble of completely independent models you can train each of these 100 tress in parallel. Just set

forest = RandomForestClassifier(n_estimators = 100,verbose=3,n_jobs=-1)

to use all of your cores. You can also limit max_depth which will speed things up (in the end you will probably need this either way, since RF can overfit badly without any limitation on depth).



来源:https://stackoverflow.com/questions/43640546/how-to-make-randomforestclassifier-faster

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!