问题
I have several algorithms: rpart, kNN, logistic regression, randomForest, Naive Bayes, and SVM. I'd like to use forward/backward and genetic algorithm selection for finding the best subset of features to use for the particular algorithms.
How can I implement wrapper type forward/backward and genetic selection of features in R?
回答1:
I'm testing wrappers at the moment so I'll give you a few Pacckage names in R. What is a wrapper?
Now to the Methods: MASS Package: Choose a model by AIC in a Stepwise Algorithm
stepAIC(model, direction = "both", trace = FALSE)
stepAIC(model, direction = "backward", trace = FALSE)
stepAIC(model, direction = "forward", trace = FALSE)
Carte Package: Backwards Feature Selection
control <- rfeControl(functions = lmFuncs, method = "repeatedcv", number = 5, verbose = TRUE)
rfe_results <- rfe(x, y, sizes = c(1:10), rfeControl = control)
or Supervised feature selection using genetic algorithms
gafs_results <- gafs(x, y, gafsControl = control)
or Simulated annealing feature selection
safs_results <- safs(x, y, iters = 10, safsControl = control)
hope i could give you a good overview. There are a lot more Methods out there...
回答2:
The caret
package in R has extensive functionality for doing this, and it would be very easy to switch amongst the algorithms you mentioned.
There is also a lot of documentation on their site:
- http://topepo.github.io/caret/featureselection.html for a general overview of feature selection in caret
- http://topepo.github.io/caret/rfe.html to implement recursive feature elimination
- http://topepo.github.io/caret/GA.html for genetic algorithms.
Hope this helps
回答3:
Here is some code for forward feature selection
selectFeature <- function(train, test, cls.train, cls.test, features) {
## identify a feature to be selected
current.best.accuracy <- -Inf #nagtive infinity
selected.i <- NULL
for(i in 1:ncol(train)) {
current.f <- colnames(train)[i]
if(!current.f %in% features) {
model <- knn(train=train[,c(features, current.f)], test=test[,c(features, current.f)], cl=cls.train, k=3)
test.acc <- sum(model == cls.test) / length(cls.test)
if(test.acc > current.best.accuracy) {
current.best.accuracy <- test.acc
selected.i <- colnames(train)[i]
}
}
}
return(selected.i)
}
##
library(caret)
set.seed(1)
inTrain <- createDataPartition(Sonar$Class, p = .6)[[1]]
allFeatures <- colnames(Sonar)[-61]
train <- Sonar[ inTrain,-61]
test <- Sonar[-inTrain,-61]
cls.train <- Sonar$Class[inTrain]
cls.test <- Sonar$Class[-inTrain]
# use correlation to determine the first feature
cls.train.numeric <- rep(c(0, 1), c(sum(cls.train == "R"), sum(cls.train == "M")))
features <- c()
current.best.cor <- 0
for(i in 1:ncol(train[,-61])) {
if(current.best.cor < abs(cor(train[,i], cls.train.numeric))) {
current.best.cor <- abs(cor(train[,i], cls.train.numeric))
features <- colnames(train)[i]
}
}
print(features)
# select the 2 to 10 best features using knn as a wrapper classifier
for (j in 2:10) {
selected.i <- selectFeature(train, test, cls.train, cls.test, features)
print(selected.i)
# add the best feature from current run
features <- c(features, selected.i)
}
来源:https://stackoverflow.com/questions/36746575/how-to-use-wrapper-feature-selection-algorithms-in-r