random-forest | 易学教程

Retrieve list of training features names from classifier

阅读更多关于 Retrieve list of training features names from classifier

问题 Is there a way to retrieve the list of feature names used for training of a classifier, once it has been trained with the fit method? I would like to get this information before applying to unseen data. The data used for training is a pandas DataFrame and in my case, the classifier is a RandomForestClassifier . 回答1: Based on the documentation and previous experience, there is no way to get a list of the features considered at least at one of the splitting. Is your concern that you do not want

Scikit Learn - ValueError: Array contains NaN or infinity

阅读更多关于 Scikit Learn - ValueError: Array contains NaN or infinity

问题 There are no NaNs in my dataset, I have checked thoroughly. Any reason why I keep getting this error when trying to fit my classifier? Some of the numbers in the data set are rather large and some decimal places go out 10 decimal points but I wouldn't thing that should cause an error. I included some of my pandas DataFrame info below as well as the error itself. Any ideas? <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6244 entries, 1985-02-06 00:00:00 to 2009-11-05 00:00:00 Data

RandomForestClassifier was given input with invalid label column error in Apache Spark

阅读更多关于 RandomForestClassifier was given input with invalid label column error in Apache Spark

I am trying to find Accuracy using 5-fold cross validation using Random Forest Classifier Model in SCALA. But i am getting the following error while running: java.lang.IllegalArgumentException: RandomForestClassifier was given input with invalid label column label, without the number of classes specified. See StringIndexer. Getting the above error at line---> val cvModel = cv.fit(trainingData) The code which i used for cross validation of data set using random forest is as follows: import org.apache.spark.ml.Pipeline import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator} import

RandomForestClassifier was given input with invalid label column error in Apache Spark

阅读更多关于 RandomForestClassifier was given input with invalid label column error in Apache Spark

问题 I am trying to find Accuracy using 5-fold cross validation using Random Forest Classifier Model in SCALA. But i am getting the following error while running: java.lang.IllegalArgumentException: RandomForestClassifier was given input with invalid label column label, without the number of classes specified. See StringIndexer. Getting the above error at line---> val cvModel = cv.fit(trainingData) The code which i used for cross validation of data set using random forest is as follows: import org

Difference of prediction results in random forest model

阅读更多关于 Difference of prediction results in random forest model

I have built an Random Forest model and I got two different prediction results when I wrote two different lines of code in order to generate the prediction. I wonder which one is the right one. Here is my example dataframe and the usedcode: dat <- read.table(text = " cats birds wolfs snakes 0 3 9 7 1 3 8 4 1 1 2 8 0 1 2 3 0 1 8 3 1 6 1 2 0 6 7 1 1 6 1 5 0 5 9 7 1 3 8 7 1 4 2 7 0 1 2 3 0 7 6 3 1 6 1 1 0 6 3 9 1 6 1 1 ",header = TRUE) I've built a random forest model: model<-randomForest(snakes~cats+birds+wolfs,data=dat,ntree=20) RF_pred<- data.frame(predict(model)) train<-cbind(train,RF_pred) #

What does the value of 'leaf' in the following xgboost model tree diagram means?

阅读更多关于 What does the value of 'leaf' in the following xgboost model tree diagram means?

I am guessing that it is conditional probability given that the above (tree branch) condition exists. However, I am not clear on it. If you want to read more about the data used or how do we get this diagram then go to : http://machinelearningmastery.com/visualize-gradient-boosting-decision-trees-xgboost-python/ Attribute leaf is the predicted value. In other words, if the evaluation of a tree model ends at that terminal node (aka leaf node), then this is the value that is returned. In pseudocode (the left-most branch of your tree model): if(f1 < 127.5){ if(f7 < 28.5){ if(f5 < 45.4){ return 0

Random forests in R (empty classes in y and argument legth 0)

阅读更多关于 Random forests in R (empty classes in y and argument legth 0)

I'm dealing for the first time with random forests and I'm having some troubles that I can't figure out.. When I run the analysis on all my dataset (about 3000 rows) I don't get any error message. But when I perform the same analysis on a subset of my dataset (about 300 rows) I get an error: dataset <- read.csv("datasetNA.csv", sep=";", header=T) names (dataset) dataset2 <- dataset[complete.cases(dataset$response),] library(randomForest) dataset2 <- na.roughfix(dataset2) data.rforest <- randomForest(dataset2$response ~ dataset2$predictorA + dataset2$predictorB+ dataset2$predictorC + dataset2

Error in train.default(x, y, weights = w, …) : final tuning parameters could not be determined

阅读更多关于 Error in train.default(x, y, weights = w, …) : final tuning parameters could not be determined

I am very new at machine learning and am attempting the forest cover prediction competition on Kaggle , but I am getting hung up pretty early on. I get the following error when I run the code below. Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined In addition: There were 50 or more warnings (use warnings() to see the first 50) # Load the libraries library(ggplot2); library(caret); library(AppliedPredictiveModeling) library(pROC) library(Amelia) set.seed(1234) # Load the forest cover dataset from the csv file rawdata <- read.csv("train.csv"

Implementing custom stopping metrics to optimize during training in H2O model directly from R

阅读更多关于 Implementing custom stopping metrics to optimize during training in H2O model directly from R

I'm trying to implement the FBeta_Score() of the MLmetrics R package : FBeta_Score <- function(y_true, y_pred, positive = NULL, beta = 1) { Confusion_DF <- ConfusionDF(y_pred, y_true) if (is.null(positive) == TRUE) positive <- as.character(Confusion_DF[1,1]) Precision <- Precision(y_true, y_pred, positive) Recall <- Recall(y_true, y_pred, positive) Fbeta_Score <- (1 + beta^2) * (Precision * Recall) / (beta^2 * Precision + Recall) return(Fbeta_Score) } in the H2O distributed random forest model and I want to optimize it during the training phase using the custom_metric_func option. The help

cforest prints empty tree

阅读更多关于 cforest prints empty tree

I'm trying to use cforest function(R, party package). This's what I do to construct forest: library("party") set.seed(42) readingSkills.cf <- cforest(score ~ ., data = readingSkills, control = cforest_unbiased(mtry = 2, ntree = 50)) Then I want to print the first tree and I do party:::prettytree(readingSkills.cf@ensemble[[1]],names(readingSkills.cf@data@get("input"))) The result look like this 1) shoeSize <= 28.29018; criterion = 1, statistic = 89.711 2) age <= 6; criterion = 1, statistic = 48.324 3) age <= 5; criterion = 0.997, statistic = 8.917 4)* weights = 0 3) age > 5 5)* weights = 0 2)