Dealing with unbalanced datasets in Spark MLlib

前端 未结 3 1298
孤街浪徒
孤街浪徒 2020-12-12 13:28

I\'m working on a particular binary classification problem with a highly unbalanced dataset, and I was wondering if anyone has tried to implement specific techniques for dea

3条回答
  •  Happy的楠姐
    2020-12-12 14:08

    @dbakr Did you get an answer for your biased prediction on your imbalanced dataset ?

    Though I'm not sure it was your original plan, note that if you first subsample the majority class of your dataset by a ratio r, then, in order to get unbaised predictions for Spark's logistic regression, you can either: - use the rawPrediction provided by the transform() function and adjust the intercept with log(r) - or you can train your regression with weights using .setWeightCol("classWeightCol") (see the article cited here to figure out the value that must be set in the weights).

提交回复
热议问题