R random forest : data (x) has 0 rows

此生再无相见时 提交于 2019-12-07 02:17:29

问题


I am using randomForest function from randomForest package to find the most important variable: my dataframe is called urban and my response variable is revenue which is numeric.

urban.random.forest <- randomForest(revenue ~ .,y=urban$revenue, data = urban, ntree=500,    keep.forest=FALSE,importance=TRUE,na.action = na.omit)

I get the following error:

Error in randomForest.default(m, y, ...) : data (x) has 0 rows

on the source code it is related to x variable:

n <- nrow(x)
p <- ncol(x)
if (n == 0) 
stop("data (x) has 0 rows")

but I cannot understand what is x.


回答1:


I solved that. I had some columns that all their values were NA or the same. I dropped them and it went OK. my columns classes were character, numeric and factor.

 candidatesnodata.index <- c()
 for (j in (1 : ncol(dataframe)))   {

   if (    is.numeric(dataframe[ ,j])  &  length(unique(as.numeric(dataframe[ ,j]))) == 1      )
     {candidatesnodata.index <- append(candidatesnodata.index,j)}
                                }

dataframe <- dataframe[ , - candidatesnodata.index]



回答2:


I have had a similar problem and it stemmed from the fact that I was passing in a string version of the call

y ~ x1 + .... xn

to the formula argument of the randomForest call. The simple fix was to cast the input to as.Formula().

I hope this saves anyone some time!



来源:https://stackoverflow.com/questions/22609010/r-random-forest-data-x-has-0-rows

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!