What is the difference between sample weight and class weight options in scikit learn?

前端未结

关注

 2  1092

我寻月下人不归

I have class imbalance problem and want to solve this using cost sensitive learning.

under sample and over sample
give weights to class to use a mod

相关标签:

2条回答

[愿得一人]

2020-12-25 14:00

It's similar concepts, but with sample_weights you can force estimator to pay more attention on some samples, and with class_weights you can force estimator to learn with attention to some particular class. sample_weight=0 or class_weight=0 basically means that estimator doesn't need to take into consideration such samples/classes in learning process at all. Thus classifier (for example) will never predict some class if class_weight = 0 for this class. If some sample_weight/class_weight bigger than sample_weight/class_weight on other samples/classes - estimator will try to minimize error on that samples/classes in the first place. You can use user-defined sample_weights and class_weights simultaneously.

If you want to undersample/oversample your training set with simple cloning/removing - this will be equal to increasing/decreasing of corresponding sample_weights/class_weights.

In more complex cases you can also try artificially generate samples, with techniques like SMOTE.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-12-25 14:25

sample_weight and class_weight have a similar function, that is to make your estimator pay more attention to some samples.

Actual sample weights will be sample_weight * weights from class_weight.

This serves the same purpose as under/oversampling but the behavior is likely to be different: say you have an algorithm that randomly picks samples (like in random forests), it matters whether you oversampled or not.

To sum it up:
class_weight and sample_weight both do 2), option 2) is one way to handle class imbalance. I don't know of an universally recommended way, I would try 1), 2) and 1) + 2) on your specific problem to see what works best.

0 讨论(0)
发布评论:

提交评论
- 加载中...