caret/rfe-error: “there should be the same number of samples in x and y”

跟風遠走 提交于 2019-12-08 13:48:36

问题


My aim is to perform cross validation with R. Columns 1-31 are Features and column 32 is the output class.
I load data from a .xls file. But I have severe issues with the rfeControl-function. Please see my code:

install.packages('e1071')
library(e1071)
install.packages('readxl')
library(readxl)
library(rpart)
install.packages('randomForest')
library(randomForest)
install.packages('party')
library(party)
install.packages('mlbench')
library(mlbench)
install.packages('caret')
library(caret)
#----------------------------------------------------------
# Import Data
getwd()
setwd("working_directory_name")
df <- read_excel('test_data.xls')
#----------------------------------------------------------
# Get Information on your data (optional)
str(df)
table(df$F32)
#----------------------------------------------------------
install.packages('XLConnect')
library(XLConnect)
# Recursive Feature Selection Approach
control <- rfeControl(functions=rfFuncs, method="cv", number=5)
#x = as.vector(unlist(df[, 2:29]))
#y = as.vector(unlist(df[, 32])) 
# Run the algorithm (Features, Ground Truth, Testes SetSizes)
#results <- rfe(x, y, sizes=c(1:28), rfeControl=control)
results <- rfe(df[, 2:29], df[, 32], sizes=c(1:28), rfeControl=control)
# Visualize results for set sizes
print(results)
# List chosen features
predictors(results)
# plot the results
plot(results, type=c("g", "o"))

The result after running the code is:

Fehler in rfe.default(df[, 2:29], df[, 32], sizes = c(1:28), rfeControl = control) : there should be the same number of samples in x and y

I've already looked at these sites:
1. http://braziebrazie.blogspot.de/2015/08/caret-r-error-in-rfedefau-should-be.html
2. R rfe function "caret" Package error: there should be the same number of samples in x and y
3. R trying to get caret / rfe to work

The suggestion from 1. to unlist the vector doesn't work for me. The new error is:

Fehler in if (nrow(x) != length(y)) stop("there should be the same number of samples in x and y") : Argument hat Länge 0

The example in 2. works without any problems:

set.seed(7)
d=data.frame(matrix(rnorm(2901*15,1,.5),ncol=15))
#something like dependent variable
dp=factor(sample(c(1,1,1,1, 1, 1,2,2,2, 3 ,3,3,4, 4, 4),2901,replace = TRUE))
# define the control using a random forest selection function
control <- rfeControl(functions=rfFuncs, method="cv", number=10)
# run the RFE algorithm
sz=50 # Change sz to 2901 for full sample
results <- rfe(d[1:sz, ],   dp[1:sz],   sizes=c(1:15), rfeControl=control)
# summarize the results
print(results)
plot(results, type=c("g", "o"))

In 3. it says

y should be a numeric or factor vector

But how do I define this as numeric or factor vector?

This is the xls file format: xls file format
Maybe the problem is there because of the way I load the xls-file.

Thanks a lot for your suggestions and recommendations!


回答1:


Had the same issue. Converted y to matrix and it worked.

results <- rfe(df[, 2:29], as.matrix(df[, 32]), sizes=c(1:28), rfeControl=control)



回答2:


Modify your call to rfe like so:

results <- rfe(df[, 2:29], df[[32]], sizes=c(1:28), rfeControl=control)

Note the change from single [] to double [[]] braces



来源:https://stackoverflow.com/questions/48902732/caret-rfe-error-there-should-be-the-same-number-of-samples-in-x-and-y

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!