combination of smote and undersampling on weka

北城余情 提交于 2019-12-11 01:59:44

问题


according to paper which written by chawla, et al (2002) the best perfomance of balancing data is combining undersampling with SMOTE.

I’ve tried to combine my dataset using under-sampling and SMOTE, but I am bit confuse about the attribute for under-sampling.

In weka there is Resample to decrease the majority class. there is a attribute in Resample biasToUniformClass -- Whether to use bias towards a uniform class. A value of 0 leaves the class distribution as-is, a value of 1 ensures the class distribution is uniform in the output data.

I use value 0 and the data in majority class is down so the minority do and when I use value 1, the data in majority decrease but in minority class, the data is up.

I try to use value 1 for that attribute, but I don't using smote to increase the instances of minority class because the data is already balance and the result is good too.

so, is that the same as I combine the SMOTE and under-sampling or I still have to try with value 0 in that attribute and do the SMOTE ?


回答1:


For undersampling, see the EasyEnsemble algorithm (a Weka implementation was developed by Schubach, Robinson, and Valentini).

The EasyEnsemble algorithm allows you to split the data into a certain number of balanced partitions. To achieve this balance, set the numIterations parameter equal to:

(# of majority instances) / (# minority instances) = numIterations

For example, if there are 30 total instances with 20 in the majority class and 10 in the minority class, set the numIterations parameter equal to 2 (i.e., 20 majority instances / 10 instances equals 2 balanced partitions). These 2 partitions should each contain 20 instances; each has the same 10 minority instances along with a different 10 instances from the majority class.

The algorithm then trains classifiers on each of the balanced partitions, and at test time, ensembles the batch of classifiers trained on each of the balanced partitions for prediction.



来源:https://stackoverflow.com/questions/27948406/combination-of-smote-and-undersampling-on-weka

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!