Dealing with unbalanced datasets in Spark MLlib

前端未结

关注

 3  1298

孤街浪徒 2020-12-12 13:28

I\'m working on a particular binary classification problem with a highly unbalanced dataset, and I was wondering if anyone has tried to implement specific techniques for dea

3条回答

Happy的楠姐 (楼主)

2020-12-12 14:08

@dbakr Did you get an answer for your biased prediction on your imbalanced dataset ?

Though I'm not sure it was your original plan, note that if you first subsample the majority class of your dataset by a ratio r, then, in order to get unbaised predictions for Spark's logistic regression, you can either: - use the rawPrediction provided by the transform() function and adjust the intercept with log(r) - or you can train your regression with weights using .setWeightCol("classWeightCol") (see the article cited here to figure out the value that must be set in the weights).

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...