Parallelizing random forests

99封情书 提交于 2020-01-04 07:46:22

问题


Through searching and asking, I've found many packages I can use to make use of all the cores of my server, and many packages that can do random forest.

I'm quite new at this, and I'm getting lost between all the ways to parallelize the training of my random forest. Could you give some advice on reasons to use and/or avoid each of them, or some specific combinations of them (and with or without caret ?) that have made their proof ?

Packages for parallelization :

doParallel,

doSNOW,

doSMP (discontinued ?),

doMC

(and what about mclapply ?)


Packages for random forest :

[caret + some of the following]

rf,

parRF,

randomForest,

ranger,

Rborist,

parallelRandomForest (crashes my R Studio session...)

Thanks


回答1:


There are a few answers on SO, such as parallel execution of random forest in R and Suggestions for speeding up Random Forests, that I would take a look at.

Those posts are helpful, but are a bit older. the ranger package is an especially fast implementation of random forest, so if you are new to this it might be the easiest way to speed up your model training. Their paper discusses the tradeoffs of some of the available packages - depending on your data size and number of features, which package gives you the best performance will vary.



来源:https://stackoverflow.com/questions/37213279/parallelizing-random-forests

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!