Difference is value between xgb.train and xgb.XGBRegressor in Python for certain cases

问题

I noticed that there are two possible implementations of XGBoost in Python as discussed here and here

When I tried running the same dataset through the two possible implementations I noticed that the results were different.

Code

import xgboost as xgb
from xgboost.sklearn import XGBRegressor
import xgboost
import pandas as pd
import numpy as np
from sklearn import datasets

boston_data = datasets.load_boston()
df = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df['target'] = pd.Series(boston_data.target)

Y = df["target"]
X = df.drop('target', axis=1)

#### Code using Native Impl for XGBoost
dtrain = xgboost.DMatrix(X, label=Y, missing=0.0)
params = {'max_depth': 3, 'learning_rate': .05, 'min_child_weight' : 4, 'subsample' : 0.8}
evallist = [(dtrain, 'eval'), (dtrain, 'train')]

model = xgboost.train(dtrain=dtrain, params=params,num_boost_round=200)

predictions = model.predict(dtrain)

#### Code using Sklearn Wrapper for XGBoost
model = XGBRegressor(n_estimators = 200, max_depth=3, learning_rate =.05, min_child_weight=4, subsample=0.8 )

#model = model.fit(X, Y, eval_set = [(X, Y), (X, Y)], eval_metric = 'rmse', verbose=True)
model = model.fit(X, Y)

predictions2 = model.predict(X)

print(np.absolute(predictions-predictions2).sum())

Absolute difference sum using sklearn boston dataset

62.687134

When I ran the same for other datasets like the sklearn diabetes dataset I observed that the difference was much smaller.

Absolute difference sum using sklearn diabetes dataset

0.0011711121

回答1:

Make sure random seeds are the same.

For both approaches set the same seed

param['seed'] = 123

EDIT: then there are a couple of different things. First is n_estimators also 200? Are you imputing missing values in the second dataset also with 0? are others default values also the same(for this one I think yes because its a wrapper, but check other 2 things)

来源：https://stackoverflow.com/questions/59395651/difference-is-value-between-xgb-train-and-xgb-xgbregressor-in-python-for-certain

标签

python

machine-learning

scikit-learn

regression

xgboost