random-forest

Using python generators in scikit-learn [closed]

拟墨画扇 提交于 2019-12-08 11:50:07
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 5 years ago . I was wondering whether and how it is possible to use a python generator as data input to scikit-learn classifier's .fit() functions? Due to huge amounts of data, this seems to make sense to me. In particular I am about to implement a random forest approach. Regards K 回答1: The answer is "no". To do

R random forest inconsistent predictions

三世轮回 提交于 2019-12-08 07:03:05
问题 I recently built a random forest model using the ranger package in R. However, I noticed that the predictions stored in the ranger object during training (accessible with model$predictions) do not match the prediction I get if I run the predict command on the same dataset using the model created. The following code reproduces the problem on the mtcars dataset. I created a binary variable just for the sake of converting this to a classification problem though I saw similar results with

Handling categorical features using scikit-learn

孤者浪人 提交于 2019-12-08 02:49:55
问题 What am I doing? I am solving a classification problem using Random Forests. I have a set of strings of a fixed length (10 characters long) that represent DNA sequences. DNA alphabet consists of 4 letters, namely A , C , G , T . Here's a sample of my raw data: ATGCTACTGA ACGTACTGAT AGCTATTGTA CGTGACTAGT TGACTATGAT Each DNA sequence comes with experimental data describing a real biological response; the molecule was seen to elicit biological response (1), or not (0). Problem: The training set

something similar to permutation accuracy importance in h2o package

坚强是说给别人听的谎言 提交于 2019-12-07 13:35:34
问题 I fitted a random forest for my multinomial target with the randomForest package in R. Looking for the variable importance I found out permutation accuracy importance which is what I was looking for my analysis. I fitted a random forest with the h2o package too, but the only measures it shows me are relative_importance, scaled_importance, percentage . My question is: can I extract a measure that shows me the level of the target which better classify the variable i want to take in exam?

Error in predicting raster with randomForest, Caret, and factor variables

空扰寡人 提交于 2019-12-07 13:13:20
问题 I am trying to predict a raster layer with randomForest and the caret package, but fail when I introduce factor variables. Without factors, everything works fine, but as soon as I bring a factor in, I get the error: Error in predict.randomForest(modelFit, newdata) : Type of predictors in new data do not match that of the training data. I have created some sample code below that walks through he process. I present it in a few steps for transparency and to provide a working example. (To skip

How do I replace the bootstrap step in the package randomForest r

本秂侑毒 提交于 2019-12-07 07:25:41
问题 First some background info, which is probably more interesting on stats.stackexchange: In my data analysis I try to compare the performance of different machine learning methods on time series data (regression, not classification). So for example I have trained a Boosting trained model and compare this with a Random Forest trained model (R package randomForest). I use time series data where the explanatory variables are lagged values of other data and the dependent variable. For some reason

How to install BigMemory and bigrf on windows OS

一笑奈何 提交于 2019-12-07 05:49:35
问题 I have been trying to install bigmemory on my R installation. My OS is windows 7 64 bit and I have tried it on R V2.15.1,2.15.2 and 3.0.1 64 bit but I cant get it to work. I have tried several options download the current source and run the command in R v3.0.1 install.packages("D:/Downloads/bigmemory_4.4.3.tar.gz", repos = NULL, type="source") but this gives an error "ERROR: Unix-only package" download older sources and run a similar commands, in the various installations of R V2 V3 etc, This

caret - random-forests not working: “Something is wrong; all the Accuracy metric values are missing:”

旧时模样 提交于 2019-12-07 05:22:32
问题 Related to these: getting this error in Caret https://github.com/topepo/caret/issues/160 I'm getting this error: Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa Min. : NA Min. : NA 1st Qu.: NA 1st Qu.: NA Median : NA Median : NA Mean :NaN Mean :NaN 3rd Qu.: NA 3rd Qu.: NA Max. : NA Max. : NA NA's :5 NA's :5 Error in train.default(x, y, weights = w, ...) : Stopping In addition: Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo

R random forest : data (x) has 0 rows

此生再无相见时 提交于 2019-12-07 02:17:29
问题 I am using randomForest function from randomForest package to find the most important variable: my dataframe is called urban and my response variable is revenue which is numeric. urban.random.forest <- randomForest(revenue ~ .,y=urban$revenue, data = urban, ntree=500, keep.forest=FALSE,importance=TRUE,na.action = na.omit) I get the following error: Error in randomForest.default(m, y, ...) : data (x) has 0 rows on the source code it is related to x variable: n <- nrow(x) p <- ncol(x) if (n ==

Use of randomforest() for classification in R?

痴心易碎 提交于 2019-12-06 21:28:20
问题 I originally had a data frame composed of 12 columns in N rows. The last column is my class (0 or 1). I had to convert my entire data frame to numeric with training <- sapply(training.temp,as.numeric) But then I thought I needed the class column to be a factor column to use the randomforest() tool as a classifier, so I did training[,"Class"] <- factor(training[,ncol(training)]) I proceed to creating the tree with training_rf <- randomForest(Class ~., data = trainData, importance = TRUE, do