random-forest

Use of scikit Random Forest sample_weights

谁说我不能喝 提交于 2019-12-03 03:09:01
I've been trying to figure out scikit's Random Forest sample_weight use and I cannot explain some of the results I'm seeing. Fundamentally I need it to balance a classification problem with unbalanced classes. In particular, I was expecting that if I used a sample_weights array of all 1's I would get the same result as w sample_weights=None . Additionally, I was expeting that any array of equal weights (i.e. all 1s, or all 10s or all 0.8s...) would provide the same result. Perhaps my intuition of weights is wrong in this case. Here's the code: import numpy as np from sklearn import ensemble

What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?

只愿长相守 提交于 2019-12-03 02:37:15
I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly. The documentation ( http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter ) indicates that specifying scoring='roc_auc' will use the sklearn.metrics.roc_auc_score. However, when I implement GridSearchCV or cross_val_score with scoring='roc_auc' I receive very different numbers that when I call roc_auc_score directly. Here is my code to help demonstrate what I see: # score the model using cross_val_score rf =

Difference between varImp (caret) and importance (randomForest) for Random Forest

柔情痞子 提交于 2019-12-03 02:36:21
I do not understand which is the difference between varImp function ( caret package) and importance function ( randomForest package) for a Random Forest model: I computed a simple RF classification model and when computing variable importance, I found that the "ranking" of predictors was not the same for both functions: Here is my code: rfImp <- randomForest(Origin ~ ., data = TAll_CS, ntree = 2000, importance = TRUE) importance(rfImp) BREAST LUNG MeanDecreaseAccuracy MeanDecreaseGini Energy_GLCM_R1SC4NG3 -1.44116806 2.8918537 1.0929302 0.3712622 Contrast_GLCM_R1SC4NG3 -2.61146974 1.5848150 -0

What is out of bag error in Random Forests? [closed]

三世轮回 提交于 2019-12-03 01:32:52
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed last year . What is out of bag error in Random Forests? Is it the optimal parameter for finding the right number of trees in a Random Forest? 回答1: I will take an attempt to explain: Suppose our training data set is represented by T and suppose data set has M features (or attributes or variables

What does negative %IncMSE in RandomForest package mean?

ぐ巨炮叔叔 提交于 2019-12-03 00:06:53
I used RandomForest for a regression problem. I used importance(rf,type=1) to get the %IncMSE for the variables and one of them has a negative %IncMSE. Does this mean that this variable is bad for the model? I searched the Internet to get some answers but I didn't find a clear one. I also found something strange in the model's summary ( attached below), It seems that only one tree was used although I defined ntrees as 800. model: rf<-randomForest(var1~va2+var3+..+var35,data=d7depo,ntree=800,keep.forest=FALSE, importance=TRUE) summary(rf) Length Class Mode call 6 -none- call type 1 -none-

Using randomForest package in R, how to get probabilities from classification model?

你。 提交于 2019-12-02 18:15:37
TL;DR : Is there something I can flag in the original randomForest call to avoid having to re-run the predict function to get predicted categorical probabilities, instead of just the likely category? Details: I am using the randomForest package. I have a model something like: model <- randomForest(x=out.data[train.rows, feature.cols], y=out.data[train.rows, response.col], xtest=out.data[test.rows, feature.cols], ytest=out.data[test.rows, response.col], importance= TRUE) where out.data is a data frame, with feature.cols a mixture of numeric and categorical features, while response.col is a TRUE

How to perform random forest/cross validation in R

本秂侑毒 提交于 2019-12-02 16:48:39
I'm unable to find a way of performing cross validation on a regression random forest model that I'm trying to produce. So I have a dataset containing 1664 explanatory variables (different chemical properties), with one response variable (retention time). I'm trying to produce a regression random forest model in order to be able to predict the chemical properties of something given its retention time. ID RT (seconds) 1_MW 2_AMW 3_Sv 4_Se 4281 38 145.29 5.01 14.76 28.37 4952 40 132.19 6.29 11 21.28 4823 41 176.21 7.34 12.9 24.92 3840 41 174.24 6.7 13.99 26.48 3665 42 240.34 9.24 15.2 27.08 3591

R Random Forests Variable Importance

烂漫一生 提交于 2019-12-02 13:53:45
I am trying to use the random forests package for classification in R. The Variable Importance Measures listed are: mean raw importance score of variable x for class 0 mean raw importance score of variable x for class 1 MeanDecreaseAccuracy MeanDecreaseGini Now I know what these "mean" as in I know their definitions. What I want to know is how to use them. What I really want to know is what these values mean in only the context of how accurate they are, what is a good value, what is a bad value, what are the maximums and minimums, etc. If a variable has a high MeanDecreaseAccuracy or

Predict using randomForest package in R

瘦欲@ 提交于 2019-12-02 06:33:42
How can I use result of randomForrest call in R to predict labels on some unlabled data (e.g. real world input to be classified)? Code: train_data = read.csv("train.csv") input_data = read.csv("input.csv") result_forest = randomForest(Label ~ ., data=train_data) labeled_input = result_forest.predict(input_data) # I need something like this train.csv: a;b;c;label; 1;1;1;a; 2;2;2;b; 1;2;1;c; input.csv: a;b;c; 1;1;1; 2;1;2; I need to get something like this a;b;c;label; 1;1;1;a; 2;1;2;b; Let me know if this is what you are getting at. You train your randomforest with your training data: #

Caret Model random forest into PMML error

感情迁移 提交于 2019-12-02 04:44:44
问题 I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting. data(iris) require(caret) require(pmml) rfGrid2 <- expand.grid(.mtry = c(1,2)) fitControl2 <- trainControl( method = "repeatedcv", number = NUMBER_OF_CV, repeats = REPEATES) model.Test <- train(Species ~ ., data = iris, method ="rf", trControl = fitControl2, ntree = NUMBER_OF_TREES, importance = TRUE, tuneGrid = rfGrid2) print