random-forest | 易学教程

How do I replace the bootstrap step in the package randomForest r

阅读更多关于 How do I replace the bootstrap step in the package randomForest r

First some background info, which is probably more interesting on stats.stackexchange: In my data analysis I try to compare the performance of different machine learning methods on time series data (regression, not classification). So for example I have trained a Boosting trained model and compare this with a Random Forest trained model (R package randomForest). I use time series data where the explanatory variables are lagged values of other data and the dependent variable. For some reason the Random Forest severely underperforms. One of the problems I could think of is that the Random Forest

How to install BigMemory and bigrf on windows OS

阅读更多关于 How to install BigMemory and bigrf on windows OS

I have been trying to install bigmemory on my R installation. My OS is windows 7 64 bit and I have tried it on R V2.15.1,2.15.2 and 3.0.1 64 bit but I cant get it to work. I have tried several options download the current source and run the command in R v3.0.1 install.packages("D:/Downloads/bigmemory_4.4.3.tar.gz", repos = NULL, type="source") but this gives an error "ERROR: Unix-only package" download older sources and run a similar commands, in the various installations of R V2 V3 etc, This gives me an error "ERROR: configuration failed for package 'bigmemory'" Any ideas? I am actually

randomForest does not work when training set has more different factor levels than test set

阅读更多关于 randomForest does not work when training set has more different factor levels than test set

When trying to test my trained model on new test data that has fewer factor levels than my training data, predict() returns the following: Type of predictors in new data do not match that of the training data. My training data has a variable with 7 factor levels and my test data has that same variable with 6 factor levels (all 6 ARE in the training data). When I add an observation containing the "missing" 7th factor, the model runs, so I'm not sure why this happens or even the logic behind it. I could see if the test set had more/different factor levels, then randomForest would choke, but why

Workaround for 32-/64-bit serialization exception on sklearn RandomForest model

阅读更多关于 Workaround for 32-/64-bit serialization exception on sklearn RandomForest model

If we serialize randomforest model using joblib on a 64-bit machine, and then unpack on a 32-bit machine, there is an exception: ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long' This question has been asked before: Scikits-Learn RandomForrest trained on 64bit python wont open on 32bit python . But the question has not been answered from since 2014. Sample code to learn the model (On a 64-bit machine): modelPath="../" featureVec=... labelVec = ... forest = RandomForestClassifier() randomSearch = RandomizedSearchCV(forest, param_distributions=param_dict, cv=10, scoring=

R random forest : data (x) has 0 rows

阅读更多关于 R random forest : data (x) has 0 rows

I am using randomForest function from randomForest package to find the most important variable: my dataframe is called urban and my response variable is revenue which is numeric. urban.random.forest <- randomForest(revenue ~ .,y=urban$revenue, data = urban, ntree=500, keep.forest=FALSE,importance=TRUE,na.action = na.omit) I get the following error: Error in randomForest.default(m, y, ...) : data (x) has 0 rows on the source code it is related to x variable: n <- nrow(x) p <- ncol(x) if (n == 0) stop("data (x) has 0 rows") but I cannot understand what is x . I solved that. I had some columns

Split data set and pass the subsets in parallel to function then recombine the results

阅读更多关于 Split data set and pass the subsets in parallel to function then recombine the results

问题 Here is what I am trying to do using the foreach package. I have data set with 600 rows and 58000 column with lots of missing values. We need to impute the missing values using package called "missForest" in which it is not parallel, it takes to much time to run this data at once. so, I am thinking to divide the data into 7 data sets (I have 7 cores) with the same number of rows (my lines) and different number of col ( markers). Then using %dopar% to pass the data sets in parallel to

caret train rf model - inexplicably long execution

阅读更多关于 caret train rf model - inexplicably long execution

问题 While trying to train random forest model with caret package, I noticed that execution time is inexplicably long: > set.seed = 1; > n = 500; > m = 30; > x = matrix(rnorm(n * m), nrow = n); > y = factor(sample.int(2, n, replace = T), labels = c("yes", "no")) > require(caret); > require(randomForest); > print(system.time({rf <- randomForest(x, y);})); user system elapsed 0.99 0.00 0.98 > print(system.time({rfmod <- train(x = x, y = y, + method = "rf", + metric = "Accuracy", + trControl =

Use of randomforest() for classification in R?

阅读更多关于 Use of randomforest() for classification in R?

I originally had a data frame composed of 12 columns in N rows. The last column is my class (0 or 1). I had to convert my entire data frame to numeric with training <- sapply(training.temp,as.numeric) But then I thought I needed the class column to be a factor column to use the randomforest() tool as a classifier, so I did training[,"Class"] <- factor(training[,ncol(training)]) I proceed to creating the tree with training_rf <- randomForest(Class ~., data = trainData, importance = TRUE, do.trace = 100) But I'm getting two errors: 1: In Ops.factor(training[, "Status"], factor(training[, ncol

Error with Sklearn Random Forest Regressor

阅读更多关于 Error with Sklearn Random Forest Regressor

When trying to fit a Random Forest Regressor model with y data that looks like this: [ 0.00000000e+00 1.36094276e+02 4.46608221e+03 8.72660888e+03 1.31375786e+04 1.73580193e+04 2.29420671e+04 3.12216341e+04 4.11395711e+04 5.07972062e+04 6.14904935e+04 7.34275322e+04 7.87333933e+04 8.46302456e+04 9.71074959e+04 1.07146672e+05 1.17187952e+05 1.26953374e+05 1.37736003e+05 1.47239359e+05 1.53943242e+05 1.78806710e+05 1.92657725e+05 2.08912711e+05 2.22855152e+05 2.34532982e+05 2.41391255e+05 2.48699216e+05 2.62421197e+05 2.79544300e+05 2.95550971e+05 3.13524275e+05 3.23365158e+05 3.24069067e+05 3

How can SciKit-Learn Random Forest sub sample size may be equal to original training data size?

阅读更多关于 How can SciKit-Learn Random Forest sub sample size may be equal to original training data size?

问题 In the documentation of SciKit-Learn Random Forest classifier , it is stated that The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default). What I dont understand is that if the sample size is always the same as the input sample size than how can we talk about a random selection. There is no selection here because we use all the (and naturally the same) samples at each training. Am I missing something here?