What is the difference between xgb.train and xgb.XGBRegressor (or xgb.XGBClassifier)?

后端未结

关注

 3  1693

我寻月下人不归

I already know \"xgboost.XGBRegressor is a Scikit-Learn Wrapper interface for XGBoost.\"

But do they have any other difference?

相关标签:

3条回答

梦如初夏

2020-12-14 18:38
xgboost.train is the low-level API to train the model via gradient boosting method.

xgboost.XGBRegressor and xgboost.XGBClassifier are the wrappers (Scikit-Learn-like wrappers, as they call it) that prepare the DMatrix and pass in the corresponding objective function and parameters. In the end, the fit call simply boils down to:
```
self._Booster = train(params, dmatrix,
                      self.n_estimators, evals=evals,
                      early_stopping_rounds=early_stopping_rounds,
                      evals_result=evals_result, obj=obj, feval=feval,
                      verbose_eval=verbose)
```
This means that everything that can be done with XGBRegressor and XGBClassifier is doable via underlying xgboost.train function. The other way around it's obviously not true, for instance, some useful parameters of xgboost.train are not supported in XGBModel API. The list of notable differences includes:
- xgboost.train allows to set the callbacks applied at end of each iteration.
- xgboost.train allows training continuation via xgb_model parameter.
- xgboost.train allows not only minization of the eval function, but maximization as well.
0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2020-12-14 18:48

From my opinion the main difference is the training/prediction speed.

For further reference I will call the xgboost.train - 'native_implementation' and XGBClassifier.fit - 'sklearn_wrapper'

I have made some benchmarks on a dataset shape (240000, 348)

Fit/train time: sklearn_wrapper time = 89 seconds native_implementation time = 7 seconds

Prediction time: sklearn_wrapper = 6 seconds native_implementation = 3.5 milliseconds

I believe this is reasoned by the fact that sklearn_wrapper is designed to use the pandas/numpy objects as input where the native_implementation needs the input data to be converted into a xgboost.DMatrix object.

In addition one can optimise n_estimators using a native_implementation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-14 18:53
@Maxim, as of xgboost 0.90 (or much before), these differences don't exist anymore in that xgboost.XGBClassifier.fit:
- has callbacks
- allows contiunation with the xgb_model parameter
- and supports the same builtin eval metrics or custom eval functions
What I find is different is evals_result, in that it has to be retrieved separately after fit (clf.evals_result()) and the resulting dict is different because it can't take advantage of the name of the evals in the watchlist ( watchlist = [(d_train, 'train'), (d_valid, 'valid')] ) .
0 讨论(0)
发布评论:

提交评论
- 加载中...