Unbalanced classification using RandomForestClassifier in sklearn

前端 未结 4 1614
借酒劲吻你
借酒劲吻你 2020-12-07 15:44

I have a dataset where the classes are unbalanced. The classes are either \'1\' or \'0\' where the ratio of class \'1\':\'0\' is 5:1. How do you calculate the prediction e

4条回答
  •  甜味超标
    2020-12-07 16:38

    You can pass sample weights argument to Random Forest fit method

    sample_weight : array-like, shape = [n_samples] or None
    

    Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.

    In older version there were a preprocessing.balance_weights method to generate balance weights for given samples, such that classes become uniformly distributed. It is still there, in internal but still usable preprocessing._weights module, but is deprecated and will be removed in future versions. Don't know exact reasons for this.

    Update

    Some clarification, as you seems to be confused. sample_weight usage is straightforward, once you remember that its purpose is to balance target classes in training dataset. That is, if you have X as observations and y as classes (labels), then len(X) == len(y) == len(sample_wight), and each element of sample witght 1-d array represent weight for a corresponding (observation, label) pair. For your case, if 1 class is represented 5 times as 0 class is, and you balance classes distributions, you could use simple

    sample_weight = np.array([5 if i == 0 else 1 for i in y])
    

    assigning weight of 5 to all 0 instances and weight of 1 to all 1 instances. See link above for a bit more crafty balance_weights weights evaluation function.

提交回复
热议问题