I\'m using the R wrapper for XGBoost. In the function xgb.cv, there is a folds
parameter with the description
list provides a po
Through some trial and error I figured out that xgboost
is using the passed indices as indices of the test folds. Confirmed this by noticing the current devel version of xgboost
explicitly states it in the documentation.
Here is an example for both generating the folds and using them.
Assume in our dataframe we have a column of ids, such that we want to put all rows with a given id value in a fold.
The code below
iterates over ids, creating lists of row indices that match
fold.ids <- unique(df$id)
custom.folds <- vector("list", length(fold.ids))
i <- 1
for( id in fold.ids){
custom.folds[[i]] <- which( df$id %in% id )
i <- i+1
}
Here is an example using the above fold list in xgb.cv
res <- xgb.cv(param, dtrain, nround, folds=custom.folds, prediction = TRUE)
Reasonable values for other xgb.cv
parameters can be found in the documentation
This worked best for me:
custom.folds <- caret::createFolds(data$Label, k=10, list=T)
xgbcv <- xgb.cv(
params = params
,data = df
,maximize = F
,prediction = T
,metrics = "logloss"
,folds = custom.folds
)