random-forest

Got continuous is not supported error in RandomForestRegressor

和自甴很熟 提交于 2020-04-29 08:56:08
问题 I'm just trying to do a simple RandomForestRegressor example. But while testing the accuracy I get this error /Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc in accuracy_score(y_true, y_pred, normalize, sample_weight) 177 178 # Compute accuracy for each possible representation --> 179 y_type, y_true, y_pred = _check_targets(y_true, y_pred) 180 if y_type.startswith('multilabel'): 181 differing_labels = count_nonzero(y_true - y_pred, axis=1) /Users

Got continuous is not supported error in RandomForestRegressor

走远了吗. 提交于 2020-04-29 08:55:33
问题 I'm just trying to do a simple RandomForestRegressor example. But while testing the accuracy I get this error /Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc in accuracy_score(y_true, y_pred, normalize, sample_weight) 177 178 # Compute accuracy for each possible representation --> 179 y_type, y_true, y_pred = _check_targets(y_true, y_pred) 180 if y_type.startswith('multilabel'): 181 differing_labels = count_nonzero(y_true - y_pred, axis=1) /Users

Python NLP - ValueError: could not convert string to float: 'UKN'

淺唱寂寞╮ 提交于 2020-04-18 12:35:02
问题 I'm trying to train a random forest regressor to predict the hourly wage of an employee given the job description supplied. Note, I've signed an NDA and cannot upload real data. The below "observation" is synthetic: sample_row = {'job_posting_id': 'id_01', 'buyer_vertical': 'Business Services', 'currency': 'USD', 'fg_onet_code': '43-9011.00', 'jp_title': 'Computer Operator', 'jp_description': "Performs information security-related risk and compliance activities, including but not limited to

Python NLP - ValueError: could not convert string to float: 'UKN'

天涯浪子 提交于 2020-04-18 12:33:15
问题 I'm trying to train a random forest regressor to predict the hourly wage of an employee given the job description supplied. Note, I've signed an NDA and cannot upload real data. The below "observation" is synthetic: sample_row = {'job_posting_id': 'id_01', 'buyer_vertical': 'Business Services', 'currency': 'USD', 'fg_onet_code': '43-9011.00', 'jp_title': 'Computer Operator', 'jp_description': "Performs information security-related risk and compliance activities, including but not limited to

How to calculate class weights for Random forests

青春壹個敷衍的年華 提交于 2020-02-24 12:22:08
问题 I have datasets for 2 classes on which I have to perform binary classification. I chose Random forest as a classifier as it is giving me the best accuracy among other models. Number of datapoints in dataset-1 is 462 and dataset-2 contains 735 datapoints. I have noticed that my data has minor class imbalance so I tried to optimise my training model and retrained my model by providing class weights. I provided following value of class weights. cwt <- c(0.385,0.614) # Class weights ss <- c(300

Variable importance with ranger

十年热恋 提交于 2020-02-18 08:43:17
问题 I trained a random forest using caret + ranger . fit <- train( y ~ x1 + x2 ,data = total_set ,method = "ranger" ,trControl = trainControl(method="cv", number = 5, allowParallel = TRUE, verbose = TRUE) ,tuneGrid = expand.grid(mtry = c(4,5,6)) ,importance = 'impurity' ) Now I'd like to see the importance of variables. However, none of these work : > importance(fit) Error in UseMethod("importance") : no applicable method for 'importance' applied to an object of class "c('train', 'train.formula')

Variable importance with ranger

谁都会走 提交于 2020-02-18 08:42:47
问题 I trained a random forest using caret + ranger . fit <- train( y ~ x1 + x2 ,data = total_set ,method = "ranger" ,trControl = trainControl(method="cv", number = 5, allowParallel = TRUE, verbose = TRUE) ,tuneGrid = expand.grid(mtry = c(4,5,6)) ,importance = 'impurity' ) Now I'd like to see the importance of variables. However, none of these work : > importance(fit) Error in UseMethod("importance") : no applicable method for 'importance' applied to an object of class "c('train', 'train.formula')

Scala: how to know which probability correspond to which class?

旧街凉风 提交于 2020-01-26 04:54:44
问题 I create a classifier random forest to predict something. The label is either "yes" (=1.0) or "no" (=0.0) I apply my model on a test. Here is my code and my result for 20 lines: import org.apache.spark.ml.tuning.CrossValidatorModel import org.apache.spark.sql.types._ import org.apache.spark.sql._ import org.apache.spark.sql.functions.udf import org.apache.spark.sql.functions._ var modelrf = CrossValidatorModel.load("modelSupervise/newModel") var test = spark.sql("""select * from dc.newTest"""

Forecasting future occurrences with Random Forest

北城以北 提交于 2020-01-25 04:14:06
问题 I'm currently exploring the use of Random Forests to predict future values of occurrences (my ARIMA model gave me really bad forecasting so I'm trying to evaluate other options). I'm fully aware that the bad results might be due to the fact that I don't have a lot of data and the quality isn't the greatest. My initial data consisted simply of the number of occurrences per date. I then added separate columns representing the day, month, year, day of the week (which was later one-hot encoded)

R randomForest - how to predict with a “getTree” tree

不羁的心 提交于 2020-01-24 14:23:08
问题 Background: I can make a random Forest in R: set.seed(1) library(randomForest) data(iris) model.rf <- randomForest(Species ~ ., data=iris, importance=TRUE, ntree=20, mtry = 2) I can predict values using the randomForest object that I just made: my_pred <- predict(model.rf) plot(iris$Species,my_pred) I can then peel off some random tree from the forest: idx <- sample(x = 1:20,size = 1,replace = F) single_tree <- getTree(model.rf,k=1) Questions: How do I predict from a single tree pulled from