Suggestions for speeding up Random Forests

后端 未结 4 1640
执念已碎
执念已碎 2020-12-01 00:46

I\'m doing some work with the randomForest package and while it works well, it can be time-consuming. Any one have any suggestions for speeding things up? I\'

4条回答
  •  借酒劲吻你
    2020-12-01 00:49

    Is there any particular reason why you're not using Python (namely the scikit-learn and multiprocessing modules) to implement this? Using joblib, I've trained random forests on datasets of similar size in a fraction of the time it takes in R. Even without multiprocessing, random forests are significantly faster in Python. Here's a quick example of training a RF classifier and cross validating in Python. You can also easily extract feature importances and visualize the trees.

    import numpy as np
    from sklearn.metrics import *
    from sklearn.cross_validation import StratifiedKFold
    from sklearn.ensemble import RandomForestClassifier
    
    #assuming that you have read in data with headers
    #first column corresponds to response variable 
    y = data[1:, 0].astype(np.float)
    X = data[1:, 1:].astype(np.float)
    
    cm = np.array([[0, 0], [0, 0]])
    precision = np.array([])
    accuracy = np.array([])
    sensitivity = np.array([])
    f1 = np.array([])
    matthews = np.array([])
    
    rf = RandomForestClassifier(n_estimators=100, max_features = 5, n_jobs = 2)
    
    #divide dataset into 5 "folds", where classes are equally balanced in each fold
    cv = StratifiedKFold(y, n_folds = 5)
    for i, (train, test) in enumerate(cv):
            classes = rf.fit(X[train], y[train]).predict(X[test])
            precision = np.append(precision, (precision_score(y[test], classes)))
            accuracy = np.append(accuracy, (accuracy_score(y[test], classes)))
            sensitivity = np.append(sensitivity, (recall_score(y[test], classes)))
            f1 = np.append(f1, (f1_score(y[test], classes)))
            matthews = np.append(matthews, (matthews_corrcoef(y[test], classes)))
            cm = np.add(cm, (confusion_matrix(y[test], classes)))
    
    print("Accuracy: %0.2f (+/- %0.2f)" % (accuracy.mean(), accuracy.std() * 2))
    print("Precision: %0.2f (+/- %0.2f)" % (precision.mean(), precision.std() * 2))
    print("Sensitivity: %0.2f (+/- %0.2f)" % (sensitivity.mean(), sensitivity.std() * 2))
    print("F1: %0.2f (+/- %0.2f)" % (f1.mean(), f1.std() * 2))
    print("Matthews: %0.2f (+/- %0.2f)" % (matthews.mean(), matthews.std() * 2))
    print(cm)
    

提交回复
热议问题