I\'m doing some work with the randomForest package and while it works well, it can be time-consuming. Any one have any suggestions for speeding things up? I\'
The manual of the foreach package has a section on Parallel Random Forests
(Using The foreach Package, Section 5.1):
> library("foreach")
> library("doSNOW")
> registerDoSNOW(makeCluster(4, type="SOCK"))
> x <- matrix(runif(500), 100)
> y <- gl(2, 50)
> rf <- foreach(ntree = rep(250, 4), .combine = combine, .packages = "randomForest") %dopar%
+ randomForest(x, y, ntree = ntree)
> rf
Call:
randomForest(x = x, y = y, ntree = ntree)
Type of random forest: classification
Number of trees: 1000
If we want want to create a random forest model with a 1000 trees, and our computer has four
cores, we can split up the problem into four pieces by executing the randomForest function four times, with the ntree argument set to 250. Of course, we have to combine the resulting randomForest objects, but the randomForest package comes with a function called combine.