I\'m doing some work with the randomForest package and while it works well, it can be time-consuming. Any one have any suggestions for speeding things up? I\'
Is there any particular reason why you're not using Python (namely the scikit-learn and multiprocessing modules) to implement this? Using joblib, I've trained random forests on datasets of similar size in a fraction of the time it takes in R. Even without multiprocessing, random forests are significantly faster in Python. Here's a quick example of training a RF classifier and cross validating in Python. You can also easily extract feature importances and visualize the trees.
import numpy as np
from sklearn.metrics import *
from sklearn.cross_validation import StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
#assuming that you have read in data with headers
#first column corresponds to response variable
y = data[1:, 0].astype(np.float)
X = data[1:, 1:].astype(np.float)
cm = np.array([[0, 0], [0, 0]])
precision = np.array([])
accuracy = np.array([])
sensitivity = np.array([])
f1 = np.array([])
matthews = np.array([])
rf = RandomForestClassifier(n_estimators=100, max_features = 5, n_jobs = 2)
#divide dataset into 5 "folds", where classes are equally balanced in each fold
cv = StratifiedKFold(y, n_folds = 5)
for i, (train, test) in enumerate(cv):
classes = rf.fit(X[train], y[train]).predict(X[test])
precision = np.append(precision, (precision_score(y[test], classes)))
accuracy = np.append(accuracy, (accuracy_score(y[test], classes)))
sensitivity = np.append(sensitivity, (recall_score(y[test], classes)))
f1 = np.append(f1, (f1_score(y[test], classes)))
matthews = np.append(matthews, (matthews_corrcoef(y[test], classes)))
cm = np.add(cm, (confusion_matrix(y[test], classes)))
print("Accuracy: %0.2f (+/- %0.2f)" % (accuracy.mean(), accuracy.std() * 2))
print("Precision: %0.2f (+/- %0.2f)" % (precision.mean(), precision.std() * 2))
print("Sensitivity: %0.2f (+/- %0.2f)" % (sensitivity.mean(), sensitivity.std() * 2))
print("F1: %0.2f (+/- %0.2f)" % (f1.mean(), f1.std() * 2))
print("Matthews: %0.2f (+/- %0.2f)" % (matthews.mean(), matthews.std() * 2))
print(cm)