What is the difference between sample weight and class weight options in scikit learn?

前端 未结 2 1094
我寻月下人不归
我寻月下人不归 2020-12-25 13:48

I have class imbalance problem and want to solve this using cost sensitive learning.

  1. under sample and over sample
  2. give weights to class to use a mod
2条回答
  •  醉酒成梦
    2020-12-25 14:25

    sample_weight and class_weight have a similar function, that is to make your estimator pay more attention to some samples.

    Actual sample weights will be sample_weight * weights from class_weight.

    This serves the same purpose as under/oversampling but the behavior is likely to be different: say you have an algorithm that randomly picks samples (like in random forests), it matters whether you oversampled or not.

    To sum it up:
    class_weight and sample_weight both do 2), option 2) is one way to handle class imbalance. I don't know of an universally recommended way, I would try 1), 2) and 1) + 2) on your specific problem to see what works best.

提交回复
热议问题