What is the difference between sample weight and class weight options in scikit learn?

前端 未结 2 1087
我寻月下人不归
我寻月下人不归 2020-12-25 13:48

I have class imbalance problem and want to solve this using cost sensitive learning.

  1. under sample and over sample
  2. give weights to class to use a mod
2条回答
  •  [愿得一人]
    2020-12-25 14:00

    It's similar concepts, but with sample_weights you can force estimator to pay more attention on some samples, and with class_weights you can force estimator to learn with attention to some particular class. sample_weight=0 or class_weight=0 basically means that estimator doesn't need to take into consideration such samples/classes in learning process at all. Thus classifier (for example) will never predict some class if class_weight = 0 for this class. If some sample_weight/class_weight bigger than sample_weight/class_weight on other samples/classes - estimator will try to minimize error on that samples/classes in the first place. You can use user-defined sample_weights and class_weights simultaneously.

    If you want to undersample/oversample your training set with simple cloning/removing - this will be equal to increasing/decreasing of corresponding sample_weights/class_weights.

    In more complex cases you can also try artificially generate samples, with techniques like SMOTE.

提交回复
热议问题