how to enforce Monotonic Constraints in XGBoost with ScikitLearn?

问题

I build up a XGBoost model using scikit-learn and I am pretty happy with it. As fine tuning to avoid overfitting, I'd like to ensure monotonicity of some features but there I start facing some difficulties...

As far as I understood, there is no documentation in scikit-learn about xgboost (which I confess I am really surprised about - knowing that this situation is lasting for several months). The only documentation I found is directly on http://xgboost.readthedocs.io

On this website, I found out that monotonicity can be enforced using "monotone_constraints" option. I tried to use it in Scikit-Learn but I got an error message "TypeError: init() got an unexpected keyword argument 'monotone_constraints'"

Do you know a way to do it ?

Here is the code I wrote in python (using spyder):

grid = {'learning_rate' : 0.01, 'subsample' : 0.5, 'colsample_bytree' : 0.5,
    'max_depth' : 6, 'min_child_weight' : 10, 'gamma' : 1, 
    'monotone_constraints' : monotonic_indexes}
#'monotone_constraints' ~ = "(1,-1)"
m07_xgm06 = xgb.XGBClassifier(n_estimators=2000, **grid)
m07_xgm06.fit(X_train_v01_oe, Label_train, early_stopping_rounds=10, eval_metric="logloss", 
    eval_set=[(X_test1_v01_oe, Label_test1)])

回答1:

In order to do this using the xgboost sklearn API, you need to upgrade to xgboost 0.81. They fixed the ability to set parameters controlled via kwargs as part of this PR: https://github.com/dmlc/xgboost/pull/3791

回答2:

XGBoost Scikit-Learn API currently (0.6a2) doesn't support monotone_constraints. You can use Python API instead. Take a look into example.

This code in the example can be removed:

params_constr['updater'] = "grow_monotone_colmaker,prune"

回答3:

How would you expect monotone constraints to work for a general classification problem where the response might have more than 2 levels? All the examples I've seen relating to this functionality are for regression problems. If your classification response only has 2 levels, try switching to regression on an indicator variable and then choose an appropriate score threshold for classification.

This feature appears to work as of the latest xgboost / scikit-learn, provided that you use an XGBregressor rather than an XGBclassifier and set monotone_constraints via kwargs.

The syntax is like this:

params = {
    'monotone_constraints':'(-1,0,1)'
}

normalised_weighted_poisson_model = XGBRegressor(**params)

In this example, there is a negative constraint on column 1 in the training data, no constraint on column 2, and a positive constraint on column 3. It is up to you to keep track of which is which - you cannot refer to columns by name, only by position, and you must specify an entry in the constraint tuple for every column in your training data.

来源：https://stackoverflow.com/questions/43076451/how-to-enforce-monotonic-constraints-in-xgboost-with-scikitlearn

标签

python

scikit-learn

xgboost