random-forest | 易学教程

R randomForest - how to predict with a “getTree” tree

阅读更多关于 R randomForest - how to predict with a “getTree” tree

问题 Background: I can make a random Forest in R: set.seed(1) library(randomForest) data(iris) model.rf <- randomForest(Species ~ ., data=iris, importance=TRUE, ntree=20, mtry = 2) I can predict values using the randomForest object that I just made: my_pred <- predict(model.rf) plot(iris$Species,my_pred) I can then peel off some random tree from the forest: idx <- sample(x = 1:20,size = 1,replace = F) single_tree <- getTree(model.rf,k=1) Questions: How do I predict from a single tree pulled from

Random Forest Classifier :To which class corresponds the probabilities

阅读更多关于 Random Forest Classifier :To which class corresponds the probabilities

问题 I am using the RandomForestClassifier from pyspark.ml.classification I run the model on a binary class dataset and display the probabilities. I have the following in the col probabilities : +-----+----------+---------------------------------------+ |label|prediction|probability | +-----+----------+---------------------------------------+ |0.0 |0.0 |[0.9005918461098429,0.0994081538901571]| |1.0 |1.0 |[0.6051335859900139,0.3948664140099861]| +-----+----------+-----------------------------------

RandomForestRegressor and feature_importances_ error

阅读更多关于 RandomForestRegressor and feature_importances_ error

问题 I am struggling to pull out the feature importances from my RandomForestRegressor, I get an: AttributeError: 'GridSearchCV' object has no attribute 'feature_importances_'. Anyone know why there is no attribute? According to documentation there should exist this attribute? The full code: from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import GridSearchCV #Running a RandomForestRegressor GridSearchCV to tune the model. parameter_candidates = { 'n_estimators' :

randomForest Error: NA not permitted in predictors (but no NAs in data)

阅读更多关于 randomForest Error: NA not permitted in predictors (but no NAs in data)

问题 So I am attempting to run the 'genie3' algorithm (ref: http://homepages.inf.ed.ac.uk/vhuynht/software.html) in R which uses the 'randomForest' method. I am running into the following Error: > weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551) Starting RF computations with 1000 trees/target gene, and 67 candidate input genes/tree node Computing gene 1/11805 Show Traceback Rerun with Debug Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE, : NA not

Different results with randomForest() and caret's randomForest (method = “rf”)

阅读更多关于 Different results with randomForest() and caret's randomForest (method = “rf”)

问题 I am new to caret, and I just want to ensure that I fully understand what it’s doing. Towards that end, I’ve been attempting to replicate the results I get from a randomForest() model using caret’s train() function for method="rf". Unfortunately, I haven’t been able to get matching results, and I’m wondering what I’m overlooking. I’ll also add that given that randomForest uses bootstrapping to generate samples to fit each of the ntrees, and estimates error based on out-of-bag predictions, I’m

How to get the probability per instance in classifications models in spark.mllib

阅读更多关于 How to get the probability per instance in classifications models in spark.mllib

问题 I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages? In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a

How to get the probability per instance in classifications models in spark.mllib

阅读更多关于 How to get the probability per instance in classifications models in spark.mllib

Creating a loop for different random forest training algorithms

阅读更多关于 Creating a loop for different random forest training algorithms

问题 Im trying to write a for loop to create various random forest models. I've stored the variables I would like to use in the different models in a list called list: list <- c("EXPG1 + EXPG2", "EXPG1 + EXPG2 + distance") Then I try to loop over it created predictions. What I finally want to achieve is this: modFit1 <- train(won ~ EXPG1 + EXPG2, data=training, method="rf", prox=TRUE) modFit2 <- train(won ~ EXPG1 + EXPG2 + distance, data=training, method="rf", prox=TRUE) I have some issues trying

Creating a loop for different random forest training algorithms

阅读更多关于 Creating a loop for different random forest training algorithms

Creating a loop for different random forest training algorithms

阅读更多关于 Creating a loop for different random forest training algorithms