What is the difference between sample weight and class weight options in scikit learn?

前端 未结 2 1085
我寻月下人不归
我寻月下人不归 2020-12-25 13:48

I have class imbalance problem and want to solve this using cost sensitive learning.

  1. under sample and over sample
  2. give weights to class to use a mod
相关标签:
2条回答
  • 2020-12-25 14:00

    It's similar concepts, but with sample_weights you can force estimator to pay more attention on some samples, and with class_weights you can force estimator to learn with attention to some particular class. sample_weight=0 or class_weight=0 basically means that estimator doesn't need to take into consideration such samples/classes in learning process at all. Thus classifier (for example) will never predict some class if class_weight = 0 for this class. If some sample_weight/class_weight bigger than sample_weight/class_weight on other samples/classes - estimator will try to minimize error on that samples/classes in the first place. You can use user-defined sample_weights and class_weights simultaneously.

    If you want to undersample/oversample your training set with simple cloning/removing - this will be equal to increasing/decreasing of corresponding sample_weights/class_weights.

    In more complex cases you can also try artificially generate samples, with techniques like SMOTE.

    0 讨论(0)
  • 2020-12-25 14:25

    sample_weight and class_weight have a similar function, that is to make your estimator pay more attention to some samples.

    Actual sample weights will be sample_weight * weights from class_weight.

    This serves the same purpose as under/oversampling but the behavior is likely to be different: say you have an algorithm that randomly picks samples (like in random forests), it matters whether you oversampled or not.

    To sum it up:
    class_weight and sample_weight both do 2), option 2) is one way to handle class imbalance. I don't know of an universally recommended way, I would try 1), 2) and 1) + 2) on your specific problem to see what works best.

    0 讨论(0)
提交回复
热议问题