random-forest

How to set seed for random simulations with foreach and doMC packages?

谁说我不能喝 提交于 2019-11-30 03:03:24
I need to do some simulations and for debugging purposes I want to use set.seed to get the same result. Here is the example of what I am trying to do: library(foreach) library(doMC) registerDoMC(2) set.seed(123) a <- foreach(i=1:2,.combine=cbind) %dopar% {rnorm(5)} set.seed(123) b <- foreach(i=1:2,.combine=cbind) %dopar% {rnorm(5)} Objects a and b should be identical, i.e. sum(abs(a-b)) should be zero, but this is not the case. I am doing something wrong, or have I stumbled on to some feature? I am able to reproduce this on two different systems with R 2.13 and R 2.14 My default answer used to

Plot trees for a Random Forest in Python with Scikit-Learn

被刻印的时光 ゝ 提交于 2019-11-30 01:57:13
I want to plot a decision tree of a random forest. So, i create the following code: clf = RandomForestClassifier(n_estimators=100) import pydotplus import six from sklearn import tree dotfile = six.StringIO() i_tree = 0 for tree_in_forest in clf.estimators_: if (i_tree <1): tree.export_graphviz(tree_in_forest, out_file=dotfile) pydotplus.graph_from_dot_data(dotfile.getvalue()).write_png('dtree'+ str(i_tree) +'.png') i_tree = i_tree + 1 But it doesn't generate anything.. Have you an idea how to plot a decision tree from random forest ? Thank you, Assuming your Random Forest model is already

Combining random forests built with different training sets in R

青春壹個敷衍的年華 提交于 2019-11-30 01:28:07
I am new to R (day 2) and have been tasked with building a forest of random forests. Each individual random forest will be built using a different training set and we will combine all the forests at the end to make predictions. I am implementing this in R and am having some difficulty combining two forests not built using the same set. My attempt is as follows: d1 = read.csv("../data/rr/train/10/chunk0.csv",header=TRUE) d2 = read.csv("../data/rr/train/10/chunk1.csv",header=TRUE) rf1 = randomForest(A55~., data=d1, ntree=10) rf2 = randomForest(A55~., data=d2, ntree=10) rf = combine(rf1,rf2) This

Random forests in R (empty classes in y and argument legth 0)

荒凉一梦 提交于 2019-11-30 01:24:38
问题 I'm dealing for the first time with random forests and I'm having some troubles that I can't figure out.. When I run the analysis on all my dataset (about 3000 rows) I don't get any error message. But when I perform the same analysis on a subset of my dataset (about 300 rows) I get an error: dataset <- read.csv("datasetNA.csv", sep=";", header=T) names (dataset) dataset2 <- dataset[complete.cases(dataset$response),] library(randomForest) dataset2 <- na.roughfix(dataset2) data.rforest <-

Error in train.default(x, y, weights = w, …) : final tuning parameters could not be determined

爱⌒轻易说出口 提交于 2019-11-29 22:45:58
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 4 years ago . I am very new at machine learning and am attempting the forest cover prediction competition on Kaggle, but I am getting hung up pretty early on. I get the following error when I run the code below. Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined In addition: There were 50 or more warnings (use warnings() to see the first 50) #

cforest prints empty tree

泄露秘密 提交于 2019-11-29 21:15:15
问题 I'm trying to use cforest function(R, party package). This's what I do to construct forest: library("party") set.seed(42) readingSkills.cf <- cforest(score ~ ., data = readingSkills, control = cforest_unbiased(mtry = 2, ntree = 50)) Then I want to print the first tree and I do party:::prettytree(readingSkills.cf@ensemble[[1]],names(readingSkills.cf@data@get("input"))) The result look like this 1) shoeSize <= 28.29018; criterion = 1, statistic = 89.711 2) age <= 6; criterion = 1, statistic =

Random Forest Feature Importance Chart using Python

断了今生、忘了曾经 提交于 2019-11-29 20:41:59
I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used: from sklearn.ensemble import RandomForestRegressor MT= pd.read_csv("MT_reduced.csv") df = MT.reset_index(drop = False) columns2 = df.columns.tolist() # Filter the columns to remove ones we don't want. columns2 = [c for c in columns2 if c not in["Violent_crime_rate","Change_Property_crime_rate","State","Year"]] # Store the variable we'll be predicting on. target = "Property_crime_rate" # Let’s randomly split our data with 80% as the

RandomForestClassfier.fit(): ValueError: could not convert string to float

穿精又带淫゛_ 提交于 2019-11-29 20:35:33
Given is a simple CSV file: A,B,C Hello,Hi,0 Hola,Bueno,1 Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so: cols = ['A','B','C'] col_types = {'A': str, 'B': str, 'C': int} test = pd.read_csv('test.csv', dtype=col_types) train_y = test['C'] == 1 train_x = test[cols] clf_rf = RandomForestClassifier(n_estimators=50) clf_rf.fit(train_x, train_y) But I just get this traceback when invoking fit(): ValueError: could not convert string to float: 'Bueno' scikit-learn version is 0.16.1. You

How do I solve overfitting in random forest of Python sklearn?

孤者浪人 提交于 2019-11-29 20:19:27
I am using RandomForestClassifier implemented in python sklearn package to build a binary classification model. The below is the results of cross validations: Fold 1 : Train: 164 Test: 40 Train Accuracy: 0.914634146341 Test Accuracy: 0.55 Fold 2 : Train: 163 Test: 41 Train Accuracy: 0.871165644172 Test Accuracy: 0.707317073171 Fold 3 : Train: 163 Test: 41 Train Accuracy: 0.889570552147 Test Accuracy: 0.585365853659 Fold 4 : Train: 163 Test: 41 Train Accuracy: 0.871165644172 Test Accuracy: 0.756097560976 Fold 5 : Train: 163 Test: 41 Train Accuracy: 0.883435582822 Test Accuracy: 0.512195121951 I

How to improve randomForest performance?

℡╲_俬逩灬. 提交于 2019-11-29 20:01:52
I have a training set of size 38 MB (12 attributes with 420000 rows). I am running the below R snippet, to train the model using randomForest . This is taking hours for me. rf.model <- randomForest( Weekly_Sales~., data=newdata, keep.forest=TRUE, importance=TRUE, ntree=200, do.trace=TRUE, na.action=na.roughfix ) I think, due to na.roughfix , it is taking long time to execute. There are so many NA's in the training set. Could someone let me know how can I improve the performance? My system configuration is: Intel(R) Core i7 CPU @ 2.90 GHz RAM - 8 GB HDD - 500 GB 64 bit OS smci (The tl;dr is you