random-forest | 易学教程

plot one of 500 trees in randomForest package

阅读更多关于 plot one of 500 trees in randomForest package

问题 How can plot trees in output of randomForest function in same names packages in R? For example I use iris data and want to plot first tree in 500 output tress. my code is model <-randomForest(Species~.,data=iris,ntree=500) 回答1: You can use the getTree() function in the randomForest package (official guide: https://cran.r-project.org/web/packages/randomForest/randomForest.pdf) On the iris dataset: require(randomForest) data(iris) ## we have a look at the k-th tree in the forest k <- 10 getTree

R Rolling Random Forest for Variables Selection [closed]

阅读更多关于 R Rolling Random Forest for Variables Selection [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I've got a daily OHLC dataset of the Euro Stoxx 50 index since 2008 which looks like that : Open High Low Close Volume Adjusted 2008-01-02 4393.53 4411.59 4330.73 4339.23 0 4339.23 2008-01-03 4335.91 4344.36 4312

h2o DRF unseen categorical values handling

阅读更多关于 h2o DRF unseen categorical values handling

问题 The documentation for DRF states What happens when you try to predict on a categorical level not seen during training? DRF converts a new categorical level to a NA value in the test set, and then splits left on the NA value during scoring. The algorithm splits left on NA values because, during training, NA values are grouped with the outliers in the left-most bin. Questions: So h2o converts unseen levels to NAs and then treats them the same way as NAs in the training data. But what if there

How to binarize RandomForest to plot a ROC in python?

阅读更多关于 How to binarize RandomForest to plot a ROC in python?

问题 I have 21 classes. I am using RandomForest. I want to plot a ROC curve, so I checked the example in scikit ROC with SVM The example uses SVM. SVM has parameters like: probability and decision_function_shape which RF does not. So how can I binarize RandomForest and plot a ROC? Thank you EDIT To create the fake data. So there are 20 features and 21 classes (3 samples for each class). df = pd.DataFrame(np.random.rand(63, 20)) label = np.arange(len(df)) // 3 + 1 df['label']=label df #TO TRAIN THE

Errors with createGrid for rf (randomForest) when using caret

阅读更多关于 Errors with createGrid for rf (randomForest) when using caret

问题 When I try to crate a grid of parameters for training with caret I get various errors: > my_grid <- createGrid("rf") Error in if (p <= len) { : argument is of length zero > my_grid <- createGrid("rf", 4) Error in if (p <= len) { : argument is of length zero > my_grid <- createGrid("rf", len=4) Error in if (p <= len) { : argument is of length zero The documentation for createGrid says: This function creates a data frame that contains a grid of complexity parameters specific methods. Usage:

How can I create a Partial Dependence plot for a categorical variable in R?

阅读更多关于 How can I create a Partial Dependence plot for a categorical variable in R?

问题 I am working with the r-package randomForest and have successfully made a random forest model and an importance plot. I am working with a dichotomous response and several categorical predictors. However, I can't figure out how to make partial dependence plots for my categorical variables. I have tried using the randomForest command partialPLot. But I get the following error: > partialPlot(rf.5, rf.train.1, religion) Error in is.finite(x) : default method not implemented for type 'list' . So

Parallel processing in R

阅读更多关于 Parallel processing in R

问题 I'm working with a custom random forest function that requires both a starting and ending point in a set of genomic data (about 56k columns). I'd like to split the column numbers into subgroups and allow each subgroup to be processed individually to speed things up. I tried this (unsuccessfully) with the following code: library(foreach) library(doMC) foreach(startMrk=(markers$start), endMrk=(markers$end)) %dopar% rfFunction(genoA,genoB,0.8,ntree=100,startMrk=startMrk,endMrk=endMrk) Where

get a call object, change parameters and run it again with the new parameters

阅读更多关于 get a call object, change parameters and run it again with the new parameters

问题 I have a model generated from a random forest. Inside it, there is a attribute called call, that will give me the what was actually the randomForest called function. I want to get this parameter, remove one column from the model and run it again. ex: library(randomForest) data(iris) iris.rf <- randomForest(Species~.-Sepal.Length, data=iris, prox=TRUE) iris.rf$call # want to remove the field Sepal.length as well # the call should be then # randomForest(Species~.-Sepal.Length-Sepal.Width, data

How to use whole training example to estimate class probabilities in sklearn RandomForest

阅读更多关于 How to use whole training example to estimate class probabilities in sklearn RandomForest

问题 I want to use scikit-learn RandomForestClassifier to estimate the probabilities of a given example to belong to a set of classes, after prior training of course. I know I can get the class probabilities using the predict_proba method, that calculates them as [...] the mean predicted class probabilities of the trees in the forest. In this question it is mentioned that: The probabilities returned by a single tree are the normalized class histograms of the leaf a sample lands in. Now, I've been

is it neccessary to run random forest with cross validation at the same time

阅读更多关于 is it neccessary to run random forest with cross validation at the same time

问题 Random forest is a robust algorithm. In Random Forest, it trains several small trees and have OOB accuracy. However, is it necessary to run cross-validation with random forest at the same time ? 回答1: OOB error is an unbiased estimate of the error for random forests, so that's great. But what are you using the cross validation for? If you are comparing the RF against some other algorithm that isn't using bagging in the same way, you want a low variance way to compare them. You have to use