Tuning xgboost with xgb.train providing a validation set in R

南楼画角 提交于 2020-01-04 01:26:29

问题


Related questions here and here. The common way of tuning xgboost (i.e. nrounds) is using xgb.cv that performs k-fold cross validation, for example:

require(xgboost)
data(iris)
set.seed(1)
index = sample(1:150)
X = as.matrix(iris[index, 1:4])
y = as.matrix(as.numeric(iris[index, "Species"])) - 1
param = list(eta=0.1, objective="multi:softprob")
xgb.cv(params=param, data=X, nrounds=50, nfold=5, label=y, num_class=3)
> train.merror.mean train.merror.std test.merror.mean test.merror.std
> 1:          0.021667         0.009501         0.040000        0.043461
> 2:          0.018333         0.006972         0.033333        0.047141
> 3:          0.018333         0.006972         0.033333        0.047141
> 4:          0.018333         0.006972         0.033333        0.047141

Anyway, I want to tune xgboost providing a validation set. This is not possible using xgb.cv. It seems that this can be achieved using xgb.train:

require(xgboost)
data(iris)
set.seed(1)
index = sample(1:150)
indexTrain = index[1:100]
indexValid = index[101:150]
Xtrain = as.matrix(iris[indexTrain, 1:4])
Xvalid = as.matrix(iris[indexValid, 1:4])
yTrain = as.numeric(iris[indexTrain, "Species"]) - 1
yValid = as.numeric(iris[indexValid, "Species"]) - 1
train = xgb.DMatrix(Xtrain, label=yTrain)
valid = xgb.DMatrix(Xvalid, label=yValid)
param = list(eta=0.1, objective="multi:softprob")
watchlist = list(eval=valid, train=train)
model = xgb.train(params=param, data=train, nround=40, watchlist=watchlist,
                  num_class=3)
>[0]    eval-merror:0.060000    train-merror:0.020000
>[1]    eval-merror:0.060000    train-merror:0.030000
>[2]    eval-merror:0.060000    train-merror:0.020000
>[3]    eval-merror:0.060000    train-merror:0.020000

In fact, while training using xgb.traing it's possible to observe the evaluation error printed in the console. Anyway, it seems that this information is lost since the only attributes of model are handle and raw.

QUESTION 1: How to retrieve the vector of the validation error printed in the console?

QUESTION 2: How to retrieve the vector of the standard error of the individual validation errors such as the once produced by xgb.cv?

EDIT1: In lines 58 and 59 here it seems that the author is able to extract the validation error. Anyway, I'm not able to adapt to do the same with the iris dataset.

EDIT2: Another (unanswered) strictly related question here

来源:https://stackoverflow.com/questions/38815666/tuning-xgboost-with-xgb-train-providing-a-validation-set-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!