xgboost in R: how does xgb.cv pass the optimal parameters into xgb.train

后端 未结 3 1455
梦毁少年i
梦毁少年i 2020-12-22 20:20

I\'ve been exploring the xgboost package in R and went through several demos as well as tutorials but this still confuses me: after using xgb.cv to

3条回答
  •  Happy的楠姐
    2020-12-22 20:50

    I found silo's answer is very helpful. In addition to his approach of random research, you may want to use Bayesian optimization to facilitate the process of hyperparameter search, e.g. rBayesianOptimization library. The following is my code with rbayesianoptimization library.

    cv_folds <- KFold(dataFTR$isPreIctalTrain, nfolds = 5, stratified = FALSE, seed = seedNum)
    xgb_cv_bayes <- function(nround,max.depth, min_child_weight, subsample,eta,gamma,colsample_bytree,max_delta_step) {
    param<-list(booster = "gbtree",
                max_depth = max.depth,
                min_child_weight = min_child_weight,
                eta=eta,gamma=gamma,
                subsample = subsample, colsample_bytree = colsample_bytree,
                max_delta_step=max_delta_step,
                lambda = 1, alpha = 0,
                objective = "binary:logistic",
                eval_metric = "auc")
    cv <- xgb.cv(params = param, data = dtrain, folds = cv_folds,nrounds = 1000,early_stopping_rounds = 10, maximize = TRUE, verbose = verbose)
    
    list(Score = cv$evaluation_log$test_auc_mean[cv$best_iteration],
         Pred=cv$best_iteration)
    # we don't need cross-validation prediction and we need the number of rounds.
    # a workaround is to pass the number of rounds(best_iteration) to the Pred, which is a default parameter in the rbayesianoptimization library.
    }
    OPT_Res <- BayesianOptimization(xgb_cv_bayes,
                                  bounds = list(max.depth =c(3L, 10L),min_child_weight = c(1L, 40L),
                                                subsample = c(0.6, 0.9),
                                                eta=c(0.01,0.3),gamma = c(0.0, 0.2),
                                                colsample_bytree=c(0.5,0.8),max_delta_step=c(1L,10L)),
                                  init_grid_dt = NULL, init_points = 10, n_iter = 10,
                                  acq = "ucb", kappa = 2.576, eps = 0.0,
                                  verbose = verbose)
    best_param <- list(
    booster = "gbtree",
    eval.metric = "auc",
    objective = "binary:logistic",
    max_depth = OPT_Res$Best_Par["max.depth"],
    eta = OPT_Res$Best_Par["eta"],
    gamma = OPT_Res$Best_Par["gamma"],
    subsample = OPT_Res$Best_Par["subsample"],
    colsample_bytree = OPT_Res$Best_Par["colsample_bytree"],
    min_child_weight = OPT_Res$Best_Par["min_child_weight"],
    max_delta_step = OPT_Res$Best_Par["max_delta_step"])
    # number of rounds should be tuned using CV
    #https://www.hackerearth.com/practice/machine-learning/machine-learning-algorithms/beginners-tutorial-on-xgboost-parameter-tuning-r/tutorial/
    # However, nrounds can not be directly derivied from the bayesianoptimization function
    # Here, OPT_Res$Pred, which was supposed to be used for cross-validation, is used to record the number of rounds
    nrounds=OPT_Res$Pred[[which.max(OPT_Res$History$Value)]]
    xgb_model <- xgb.train (params = best_param, data = dtrain, nrounds = nrounds)
    

提交回复
热议问题