R, DMwR-package, SMOTE-function won't work

时间秒杀一切 提交于 2019-12-05 10:10:25

I don't have the full answer. I can provide another clue though:

If you convert 'y' to a factor, SMOTE will return without error - but the synthesized observations have NA values for x.

SMOTE has a bug in OS Win7 32 bit, It assume the target variable in the parameter 'form' is the last column in the dataset, the following code will explain

library(DMwR)
data(iris)
# data <- iris[, c(1, 2, 5)]  # SMOTE work
data <- iris[, c(2, 5, 1)]  # SMOTE bug
data$Species <- factor(ifelse(data$Species == "setosa", "rare", "common"))
head(data)
table(data$Species)
newData <- SMOTE(Species ~., data, perc.over=600, perc.under=100)
table(newData$Species)

It will show following message

Error in colnames<-(*tmp*, value = c("Sepal.Width", "Species", "Sepal.Length" : 'names' attribute [3] must be the same length as the vector [2]

In Win7 64bit, the order problem does not occur!!

There is a bug in the SMOTE code. It assumes the y function it's being fed is already a factor variable, currently it does not handle the edge case of non-factors. Make sure to cast to a factor before calling the method.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!