问题
I have multi-class classification problem and data is heavily skewed. My target variable (y) has 3 classes and their % in data is as follows: - 0=3% - 1=90% - 2=7%
I am looking for Packages in R which can do multi-class oversampling, Undersampling or both the techniques.
If it is not doable in R then where I can handle this problem.?
PS: I tried using ROSE package in R but it works only for binary class problems.
回答1:
Well there is the caret-package which offers a wide range of ML-algorithms including for multi-class problems.
It also can apply down- and upsampling methods via: downSample(), upSample()
trainclass <- data.frame("label" = c(rep("class1", 100), rep("class2", 20), rep("class3", 180)),
"predictor1" = rnorm(300, 0 ,1),
"predictor2" = sample(c("this", "that"), 300, replace = TRUE))
> table(trainclass$label)
class1 class2 class3
100 20 180
#then use
set.seed(234)
dtrain <- downSample(x = trainclass[, -1],
y = trainclass$label)
> table(dtrain$Class)
class1 class2 class3
20 20 20
Nice feat: It can also do downsampling, upsampling as well as SMOTE and ROSE while applying resampling procedures (such as crossvalidation)
This performs 10-fold cross-validation using downsampling.
ctrl <- caret::trainControl(method = "cv",
number = 10,
verboseIter = FALSE,
summaryFunction = multiClassSummary
sampling = "down")
set.seed(42)
model_rf_under <- caret::train(Class ~ .,
data = data,
method = "rf",
trControl = ctrl)
See further information here: https://topepo.github.io/caret/subsampling-for-class-imbalances.html
Also Check out the mlr-package:
https://mlr.mlr-org.com/articles/tutorial/over_and_undersampling.html#sampling-based-approaches
回答2:
You can use SMOTE function under DMwR packages. I have created a sample dataset and make three Imbalance class..
install.packages("DMwR")
library(DMwR)
## A small example with a data set created artificially from the IRIS
## data
data(iris)
#setosa 90%, versicolor 3% and virginica 7%
Species<-c(rep("setosa",135),rep("versicolor",5),rep("virginica",10))
data<-cbind(iris[,1:4],Species)
table(data$Species)
Imbalance class:
setosa versicolor virginica
135 5 10
Now, for recovering 2 imbalance class, apply SMOTE functions 2 times on data...
First_Imbalence_recover <- DMwR::SMOTE(Species ~ ., data, perc.over = 2000,perc.under=100)
Final_Imbalence_recover <- DMwR::SMOTE(Species ~ ., First_Imbalence_recover, perc.over = 2000,perc.under=200)
table(Final_Imbalence_recover$Species)
Final balance class:
setosa versicolor virginica
79 81 84
NOTE: These examples will be generated by using the information from the k nearest neighbors of each example of the minority class. The parameter k controls how many of these neighbors are used. So, the class may vary every run, which shouldn't affect overall balancing.
来源:https://stackoverflow.com/questions/54779380/handling-imbalanced-data-in-multi-class-classification-problem