Tuning xgboost with xgb.train providing a validation set in R

问题

Related questions here and here. The common way of tuning xgboost (i.e. nrounds) is using xgb.cv that performs k-fold cross validation, for example:

require(xgboost)
data(iris)
set.seed(1)
index = sample(1:150)
X = as.matrix(iris[index, 1:4])
y = as.matrix(as.numeric(iris[index, "Species"])) - 1
param = list(eta=0.1, objective="multi:softprob")
xgb.cv(params=param, data=X, nrounds=50, nfold=5, label=y, num_class=3)
> train.merror.mean train.merror.std test.merror.mean test.merror.std
> 1:          0.021667         0.009501         0.040000        0.043461
> 2:          0.018333         0.006972         0.033333        0.047141
> 3:          0.018333         0.006972         0.033333        0.047141
> 4:          0.018333         0.006972         0.033333        0.047141

Anyway, I want to tune xgboost providing a validation set. This is not possible using xgb.cv. It seems that this can be achieved using xgb.train:

require(xgboost)
data(iris)
set.seed(1)
index = sample(1:150)
indexTrain = index[1:100]
indexValid = index[101:150]
Xtrain = as.matrix(iris[indexTrain, 1:4])
Xvalid = as.matrix(iris[indexValid, 1:4])
yTrain = as.numeric(iris[indexTrain, "Species"]) - 1
yValid = as.numeric(iris[indexValid, "Species"]) - 1
train = xgb.DMatrix(Xtrain, label=yTrain)
valid = xgb.DMatrix(Xvalid, label=yValid)
param = list(eta=0.1, objective="multi:softprob")
watchlist = list(eval=valid, train=train)
model = xgb.train(params=param, data=train, nround=40, watchlist=watchlist,
                  num_class=3)
>[0]    eval-merror:0.060000    train-merror:0.020000
>[1]    eval-merror:0.060000    train-merror:0.030000
>[2]    eval-merror:0.060000    train-merror:0.020000
>[3]    eval-merror:0.060000    train-merror:0.020000

In fact, while training using xgb.traing it's possible to observe the evaluation error printed in the console. Anyway, it seems that this information is lost since the only attributes of model are handle and raw.

QUESTION 1: How to retrieve the vector of the validation error printed in the console?

QUESTION 2: How to retrieve the vector of the standard error of the individual validation errors such as the once produced by xgb.cv?

EDIT1: In lines 58 and 59 here it seems that the author is able to extract the validation error. Anyway, I'm not able to adapt to do the same with the iris dataset.

EDIT2: Another (unanswered) strictly related question here

来源：https://stackoverflow.com/questions/38815666/tuning-xgboost-with-xgb-train-providing-a-validation-set-in-r

标签

machine-learning

cross-validation

xgboost