XGBoost for multilabel classification?

前端 未结 3 955
一整个雨季
一整个雨季 2020-12-30 03:10

Is it possible to use XGBoost for multi-label classification? Now I use OneVsRestClassifier over GradientBoostingClassifier from sklearn

3条回答
  •  鱼传尺愫
    2020-12-30 03:31

    There are a couple of ways to do that, one of which is the one you already suggested:

    1.

    from xgboost import XGBClassifier
    from sklearn.multiclass import OneVsRestClassifier
    # If you want to avoid the OneVsRestClassifier magic switch
    # from sklearn.multioutput import MultiOutputClassifier
    
    clf_multilabel = OneVsRestClassifier(XGBClassifier(**params))
    

    clf_multilabel will fit one binary classifier per class, and it will use however many cores you specify in params (fyi, you can also specify n_jobs in OneVsRestClassifier, but that eats up more memory).

    2. If you first massage your data a little by making k copies of every data point that has k correct labels, you can hack your way to a simpler multiclass problem. At that point, just

    clf = XGBClassifier(**params)
    clf.fit(train_data)
    pred_proba = clf.predict_proba(test_data)
    

    to get classification margins/probabilities for each class and decide what threshold you want for predicting a label. Note that this solution is not exact: if a product has tags (1, 2, 3), you artificially introduce two negative samples for each class.

提交回复
热议问题