R caret package rfe never finishes error task 1 failed - “replacement has length zero”

ぃ、小莉子 提交于 2019-11-29 11:56:17

So looking at the data, there are three reasons for the failure. First,

> str(x)
'data.frame':   100 obs. of  34 variables:
 $ f2  : Factor w/ 10 levels "1","2","3","4",..: 8 8 8 8 9 8 9 9 7 8 ...
<snip>

rfe fits an lm model to these data and generates 39 coefficients even though the data frame x has 34 columns. As a result, rfe gets... confused. Try using model.matrix to convert the factor to dummy variables before running rfe:

x2 <- model.matrix(~., data = x)[,-1]  ## the -1 removes the intercept column

... but...

> table(x$f2)

 1  2  3  4  6  7  8  9 10 11 
 0  0  0  2  2  5 32 36 23  0 

so model.matrix will generate some zero-variance predictors (which is an issue). You could make a new factor with new levels that excludes the empty levels but keep in mind that any resampling on these data will coerce some of the factor levels (e.g. "4", "6") into zero-variance predictors.

Secondly, there is perfect correlation between some predictors:

> cor(x$f597, x$f599)
     [,1]
[1,]    1

This will cause NA values for some of the model coefficients and lead to missing variable importances and will tank rfe.

Unless you are using trees or some other model that is tolerant to sparse and/or correlated predictors, a possible workflow prior to rfe could be:

> x2 <- model.matrix(~., data = x)[,-1]
> 
> nzv <- nearZeroVar(x2)
> x3 <- x2[, -nzv]
> 
> corr_mat <- cor(x3)
> too_high <- findCorrelation(corr_mat, cutoff = .9)
> x4 <- x3[, -too_high]
> 
> c(ncol(x2), ncol(x3), ncol(x4))
[1] 42 37 27

Lastly, by the looks of y you want to predict a number but lrFuncs is for logistic regression so I assume it was a typo for lmFuncs. If that is the case, rfe works fine:

> subsets <- c(1:5, 10, 15, 20, 25)
> ctrl <- rfeControl(functions = lmFuncs,
+                    method = "repeatedcv",
+                    repeats = 1,
+                    number=5)
> set.seed(1)
> lrProfile <- rfe(as.data.frame(x4), y,
+                  sizes = subsets,
+                  rfeControl = ctrl)

Max

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!