random-forest

Retrieve list of training features names from classifier

纵然是瞬间 提交于 2019-12-01 07:39:04
问题 Is there a way to retrieve the list of feature names used for training of a classifier, once it has been trained with the fit method? I would like to get this information before applying to unseen data. The data used for training is a pandas DataFrame and in my case, the classifier is a RandomForestClassifier . 回答1: Based on the documentation and previous experience, there is no way to get a list of the features considered at least at one of the splitting. Is your concern that you do not want

Scikit Learn - ValueError: Array contains NaN or infinity

泄露秘密 提交于 2019-12-01 06:04:58
问题 There are no NaNs in my dataset, I have checked thoroughly. Any reason why I keep getting this error when trying to fit my classifier? Some of the numbers in the data set are rather large and some decimal places go out 10 decimal points but I wouldn't thing that should cause an error. I included some of my pandas DataFrame info below as well as the error itself. Any ideas? <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6244 entries, 1985-02-06 00:00:00 to 2009-11-05 00:00:00 Data

RandomForestClassifier was given input with invalid label column error in Apache Spark

 ̄綄美尐妖づ 提交于 2019-12-01 05:59:43
I am trying to find Accuracy using 5-fold cross validation using Random Forest Classifier Model in SCALA. But i am getting the following error while running: java.lang.IllegalArgumentException: RandomForestClassifier was given input with invalid label column label, without the number of classes specified. See StringIndexer. Getting the above error at line---> val cvModel = cv.fit(trainingData) The code which i used for cross validation of data set using random forest is as follows: import org.apache.spark.ml.Pipeline import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator} import

RandomForestClassifier was given input with invalid label column error in Apache Spark

ぐ巨炮叔叔 提交于 2019-12-01 03:46:55
问题 I am trying to find Accuracy using 5-fold cross validation using Random Forest Classifier Model in SCALA. But i am getting the following error while running: java.lang.IllegalArgumentException: RandomForestClassifier was given input with invalid label column label, without the number of classes specified. See StringIndexer. Getting the above error at line---> val cvModel = cv.fit(trainingData) The code which i used for cross validation of data set using random forest is as follows: import org

Difference of prediction results in random forest model

回眸只為那壹抹淺笑 提交于 2019-12-01 01:23:32
I have built an Random Forest model and I got two different prediction results when I wrote two different lines of code in order to generate the prediction. I wonder which one is the right one. Here is my example dataframe and the usedcode: dat <- read.table(text = " cats birds wolfs snakes 0 3 9 7 1 3 8 4 1 1 2 8 0 1 2 3 0 1 8 3 1 6 1 2 0 6 7 1 1 6 1 5 0 5 9 7 1 3 8 7 1 4 2 7 0 1 2 3 0 7 6 3 1 6 1 1 0 6 3 9 1 6 1 1 ",header = TRUE) I've built a random forest model: model<-randomForest(snakes~cats+birds+wolfs,data=dat,ntree=20) RF_pred<- data.frame(predict(model)) train<-cbind(train,RF_pred) #

What does the value of 'leaf' in the following xgboost model tree diagram means?

≡放荡痞女 提交于 2019-11-30 18:37:12
I am guessing that it is conditional probability given that the above (tree branch) condition exists. However, I am not clear on it. If you want to read more about the data used or how do we get this diagram then go to : http://machinelearningmastery.com/visualize-gradient-boosting-decision-trees-xgboost-python/ Attribute leaf is the predicted value. In other words, if the evaluation of a tree model ends at that terminal node (aka leaf node), then this is the value that is returned. In pseudocode (the left-most branch of your tree model): if(f1 < 127.5){ if(f7 < 28.5){ if(f5 < 45.4){ return 0

Random forests in R (empty classes in y and argument legth 0)

人盡茶涼 提交于 2019-11-30 17:49:30
I'm dealing for the first time with random forests and I'm having some troubles that I can't figure out.. When I run the analysis on all my dataset (about 3000 rows) I don't get any error message. But when I perform the same analysis on a subset of my dataset (about 300 rows) I get an error: dataset <- read.csv("datasetNA.csv", sep=";", header=T) names (dataset) dataset2 <- dataset[complete.cases(dataset$response),] library(randomForest) dataset2 <- na.roughfix(dataset2) data.rforest <- randomForest(dataset2$response ~ dataset2$predictorA + dataset2$predictorB+ dataset2$predictorC + dataset2

Error in train.default(x, y, weights = w, …) : final tuning parameters could not be determined

孤者浪人 提交于 2019-11-30 15:49:36
I am very new at machine learning and am attempting the forest cover prediction competition on Kaggle , but I am getting hung up pretty early on. I get the following error when I run the code below. Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined In addition: There were 50 or more warnings (use warnings() to see the first 50) # Load the libraries library(ggplot2); library(caret); library(AppliedPredictiveModeling) library(pROC) library(Amelia) set.seed(1234) # Load the forest cover dataset from the csv file rawdata <- read.csv("train.csv"

Implementing custom stopping metrics to optimize during training in H2O model directly from R

房东的猫 提交于 2019-11-30 15:07:47
I'm trying to implement the FBeta_Score() of the MLmetrics R package : FBeta_Score <- function(y_true, y_pred, positive = NULL, beta = 1) { Confusion_DF <- ConfusionDF(y_pred, y_true) if (is.null(positive) == TRUE) positive <- as.character(Confusion_DF[1,1]) Precision <- Precision(y_true, y_pred, positive) Recall <- Recall(y_true, y_pred, positive) Fbeta_Score <- (1 + beta^2) * (Precision * Recall) / (beta^2 * Precision + Recall) return(Fbeta_Score) } in the H2O distributed random forest model and I want to optimize it during the training phase using the custom_metric_func option. The help

cforest prints empty tree

心已入冬 提交于 2019-11-30 14:45:54
I'm trying to use cforest function(R, party package). This's what I do to construct forest: library("party") set.seed(42) readingSkills.cf <- cforest(score ~ ., data = readingSkills, control = cforest_unbiased(mtry = 2, ntree = 50)) Then I want to print the first tree and I do party:::prettytree(readingSkills.cf@ensemble[[1]],names(readingSkills.cf@data@get("input"))) The result look like this 1) shoeSize <= 28.29018; criterion = 1, statistic = 89.711 2) age <= 6; criterion = 1, statistic = 48.324 3) age <= 5; criterion = 0.997, statistic = 8.917 4)* weights = 0 3) age > 5 5)* weights = 0 2)