random-forest

plot one of 500 trees in randomForest package

自古美人都是妖i 提交于 2019-12-11 04:54:58
问题 How can plot trees in output of randomForest function in same names packages in R? For example I use iris data and want to plot first tree in 500 output tress. my code is model <-randomForest(Species~.,data=iris,ntree=500) 回答1: You can use the getTree() function in the randomForest package (official guide: https://cran.r-project.org/web/packages/randomForest/randomForest.pdf) On the iris dataset: require(randomForest) data(iris) ## we have a look at the k-th tree in the forest k <- 10 getTree

R Rolling Random Forest for Variables Selection [closed]

爷,独闯天下 提交于 2019-12-11 04:45:31
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I've got a daily OHLC dataset of the Euro Stoxx 50 index since 2008 which looks like that : Open High Low Close Volume Adjusted 2008-01-02 4393.53 4411.59 4330.73 4339.23 0 4339.23 2008-01-03 4335.91 4344.36 4312

h2o DRF unseen categorical values handling

坚强是说给别人听的谎言 提交于 2019-12-11 04:27:13
问题 The documentation for DRF states What happens when you try to predict on a categorical level not seen during training? DRF converts a new categorical level to a NA value in the test set, and then splits left on the NA value during scoring. The algorithm splits left on NA values because, during training, NA values are grouped with the outliers in the left-most bin. Questions: So h2o converts unseen levels to NAs and then treats them the same way as NAs in the training data. But what if there

How to binarize RandomForest to plot a ROC in python?

心不动则不痛 提交于 2019-12-11 04:15:52
问题 I have 21 classes. I am using RandomForest. I want to plot a ROC curve, so I checked the example in scikit ROC with SVM The example uses SVM. SVM has parameters like: probability and decision_function_shape which RF does not. So how can I binarize RandomForest and plot a ROC? Thank you EDIT To create the fake data. So there are 20 features and 21 classes (3 samples for each class). df = pd.DataFrame(np.random.rand(63, 20)) label = np.arange(len(df)) // 3 + 1 df['label']=label df #TO TRAIN THE

Errors with createGrid for rf (randomForest) when using caret

风流意气都作罢 提交于 2019-12-11 04:14:44
问题 When I try to crate a grid of parameters for training with caret I get various errors: > my_grid <- createGrid("rf") Error in if (p <= len) { : argument is of length zero > my_grid <- createGrid("rf", 4) Error in if (p <= len) { : argument is of length zero > my_grid <- createGrid("rf", len=4) Error in if (p <= len) { : argument is of length zero The documentation for createGrid says: This function creates a data frame that contains a grid of complexity parameters specific methods. Usage:

How can I create a Partial Dependence plot for a categorical variable in R?

*爱你&永不变心* 提交于 2019-12-11 03:36:14
问题 I am working with the r-package randomForest and have successfully made a random forest model and an importance plot. I am working with a dichotomous response and several categorical predictors. However, I can't figure out how to make partial dependence plots for my categorical variables. I have tried using the randomForest command partialPLot. But I get the following error: > partialPlot(rf.5, rf.train.1, religion) Error in is.finite(x) : default method not implemented for type 'list' . So

Parallel processing in R

萝らか妹 提交于 2019-12-11 03:22:09
问题 I'm working with a custom random forest function that requires both a starting and ending point in a set of genomic data (about 56k columns). I'd like to split the column numbers into subgroups and allow each subgroup to be processed individually to speed things up. I tried this (unsuccessfully) with the following code: library(foreach) library(doMC) foreach(startMrk=(markers$start), endMrk=(markers$end)) %dopar% rfFunction(genoA,genoB,0.8,ntree=100,startMrk=startMrk,endMrk=endMrk) Where

get a call object, change parameters and run it again with the new parameters

六月ゝ 毕业季﹏ 提交于 2019-12-11 03:18:18
问题 I have a model generated from a random forest. Inside it, there is a attribute called call, that will give me the what was actually the randomForest called function. I want to get this parameter, remove one column from the model and run it again. ex: library(randomForest) data(iris) iris.rf <- randomForest(Species~.-Sepal.Length, data=iris, prox=TRUE) iris.rf$call # want to remove the field Sepal.length as well # the call should be then # randomForest(Species~.-Sepal.Length-Sepal.Width, data

How to use whole training example to estimate class probabilities in sklearn RandomForest

南笙酒味 提交于 2019-12-11 01:45:32
问题 I want to use scikit-learn RandomForestClassifier to estimate the probabilities of a given example to belong to a set of classes, after prior training of course. I know I can get the class probabilities using the predict_proba method, that calculates them as [...] the mean predicted class probabilities of the trees in the forest. In this question it is mentioned that: The probabilities returned by a single tree are the normalized class histograms of the leaf a sample lands in. Now, I've been

is it neccessary to run random forest with cross validation at the same time

萝らか妹 提交于 2019-12-10 22:46:41
问题 Random forest is a robust algorithm. In Random Forest, it trains several small trees and have OOB accuracy. However, is it necessary to run cross-validation with random forest at the same time ? 回答1: OOB error is an unbiased estimate of the error for random forests, so that's great. But what are you using the cross validation for? If you are comparing the RF against some other algorithm that isn't using bagging in the same way, you want a low variance way to compare them. You have to use