random-forest

R randomForest - how to predict with a “getTree” tree

自作多情 提交于 2020-01-24 14:22:25
问题 Background: I can make a random Forest in R: set.seed(1) library(randomForest) data(iris) model.rf <- randomForest(Species ~ ., data=iris, importance=TRUE, ntree=20, mtry = 2) I can predict values using the randomForest object that I just made: my_pred <- predict(model.rf) plot(iris$Species,my_pred) I can then peel off some random tree from the forest: idx <- sample(x = 1:20,size = 1,replace = F) single_tree <- getTree(model.rf,k=1) Questions: How do I predict from a single tree pulled from

Random Forest Classifier :To which class corresponds the probabilities

我只是一个虾纸丫 提交于 2020-01-24 11:17:11
问题 I am using the RandomForestClassifier from pyspark.ml.classification I run the model on a binary class dataset and display the probabilities. I have the following in the col probabilities : +-----+----------+---------------------------------------+ |label|prediction|probability | +-----+----------+---------------------------------------+ |0.0 |0.0 |[0.9005918461098429,0.0994081538901571]| |1.0 |1.0 |[0.6051335859900139,0.3948664140099861]| +-----+----------+-----------------------------------

RandomForestRegressor and feature_importances_ error

泪湿孤枕 提交于 2020-01-24 04:01:08
问题 I am struggling to pull out the feature importances from my RandomForestRegressor, I get an: AttributeError: 'GridSearchCV' object has no attribute 'feature_importances_'. Anyone know why there is no attribute? According to documentation there should exist this attribute? The full code: from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import GridSearchCV #Running a RandomForestRegressor GridSearchCV to tune the model. parameter_candidates = { 'n_estimators' :

randomForest Error: NA not permitted in predictors (but no NAs in data)

你离开我真会死。 提交于 2020-01-16 19:16:08
问题 So I am attempting to run the 'genie3' algorithm (ref: http://homepages.inf.ed.ac.uk/vhuynht/software.html) in R which uses the 'randomForest' method. I am running into the following Error: > weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551) Starting RF computations with 1000 trees/target gene, and 67 candidate input genes/tree node Computing gene 1/11805 Show Traceback Rerun with Debug Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE, : NA not

Different results with randomForest() and caret's randomForest (method = “rf”)

笑着哭i 提交于 2020-01-12 03:27:16
问题 I am new to caret, and I just want to ensure that I fully understand what it’s doing. Towards that end, I’ve been attempting to replicate the results I get from a randomForest() model using caret’s train() function for method="rf". Unfortunately, I haven’t been able to get matching results, and I’m wondering what I’m overlooking. I’ll also add that given that randomForest uses bootstrapping to generate samples to fit each of the ntrees, and estimates error based on out-of-bag predictions, I’m

How to get the probability per instance in classifications models in spark.mllib

雨燕双飞 提交于 2020-01-09 11:56:32
问题 I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages? In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a

How to get the probability per instance in classifications models in spark.mllib

倾然丶 夕夏残阳落幕 提交于 2020-01-09 11:56:07
问题 I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages? In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a

Creating a loop for different random forest training algorithms

时间秒杀一切 提交于 2020-01-07 03:05:54
问题 Im trying to write a for loop to create various random forest models. I've stored the variables I would like to use in the different models in a list called list: list <- c("EXPG1 + EXPG2", "EXPG1 + EXPG2 + distance") Then I try to loop over it created predictions. What I finally want to achieve is this: modFit1 <- train(won ~ EXPG1 + EXPG2, data=training, method="rf", prox=TRUE) modFit2 <- train(won ~ EXPG1 + EXPG2 + distance, data=training, method="rf", prox=TRUE) I have some issues trying

Creating a loop for different random forest training algorithms

本小妞迷上赌 提交于 2020-01-07 03:05:26
问题 Im trying to write a for loop to create various random forest models. I've stored the variables I would like to use in the different models in a list called list: list <- c("EXPG1 + EXPG2", "EXPG1 + EXPG2 + distance") Then I try to loop over it created predictions. What I finally want to achieve is this: modFit1 <- train(won ~ EXPG1 + EXPG2, data=training, method="rf", prox=TRUE) modFit2 <- train(won ~ EXPG1 + EXPG2 + distance, data=training, method="rf", prox=TRUE) I have some issues trying

Creating a loop for different random forest training algorithms

本小妞迷上赌 提交于 2020-01-07 03:05:03
问题 Im trying to write a for loop to create various random forest models. I've stored the variables I would like to use in the different models in a list called list: list <- c("EXPG1 + EXPG2", "EXPG1 + EXPG2 + distance") Then I try to loop over it created predictions. What I finally want to achieve is this: modFit1 <- train(won ~ EXPG1 + EXPG2, data=training, method="rf", prox=TRUE) modFit2 <- train(won ~ EXPG1 + EXPG2 + distance, data=training, method="rf", prox=TRUE) I have some issues trying