xgboost | 易学教程

xgb.fi() function detecting interactions and working with xgboost returns exception

阅读更多关于 xgb.fi() function detecting interactions and working with xgboost returns exception

问题 xgb.fi() is a new function that works with xgboost to detect interactions between variables. The documentation can be found here: https://rdrr.io/github/RSimran/RXGBfi/man/xgb.fi.html This is an important subject and I tried to test the function but I run into an exception. See below for a reproducible example: library(data.table) library(xgboost) library(RXGBfi) data(mtcars) X = as.matrix(mtcars[, -9]) Y = mtcars$am dtrain = xgb.DMatrix(data = X, label = Y) model = xgb.train(data = dtrain,

XgBoost Script is not outputing binary properly

阅读更多关于 XgBoost Script is not outputing binary properly

问题 I'm learning to use xgboost , and I have read through the documentation! However, I'm not understanding why the output of my script is coming out between 0~~2 . First, I thought it should come as either 0 or 1, since its a binary classification, but then, I read it comes as a probability of 0 or 1, however, some outputs are 1.5+ ( at least on the CSV ), which doesnt make sense to me! I'm unsure if the problem is on xgboost parameters or in the csv creation! This line, np.expm1(preds) , im not

XGBOOST - DMATRIX

阅读更多关于 XGBOOST - DMATRIX

问题 I pulled some ML code that ran on kaggle (linux) and tried to run it in a jupyter notebook on a windows machine. Here is the code (some of it): ##### RUN XGBOOST import xgboost as xgb print("\nSetting up data for XGBoost ...") # xgboost params xgb_params = { 'eta': 0.037, 'max_depth': 5, 'subsample': 0.80, 'objective': 'reg:linear', 'eval_metric': 'mae', 'lambda': 0.8, 'alpha': 0.4, 'base_score': y_mean, 'silent': 1 } #### These lines were causing the folloing error on 9/1/2017: #

Windows xgboost error

阅读更多关于 Windows xgboost error

问题 It was a pain just to install xgboost library, but now other mistake happened on Windows 8.1 64-bit import xgboost as xgb Traceback (most recent call last): File "C:/Users/Пашка/PycharmProjects/kaggler bank santander/1.py", line 12, in <module> import xgboost as xgb File "C:\Python34\lib\site-packages\xgboost-0.4-py3.4.egg\xgboost\__init__.py", line 11, in <module> from .core import DMatrix, Booster File "C:\Python34\lib\site-packages\xgboost-0.4-py3.4.egg\xgboost\core.py", line 83, in

Setting Tol for XGBoost Early Stopping

阅读更多关于 Setting Tol for XGBoost Early Stopping

问题 I am using XGBoost with early stopping. After about 1000 epochs, the model is still improving, but the magnitude of improvement is very low. I.e.: clf = xgb.train(params, dtrain, num_boost_round=num_rounds, evals=watchlist, early_stopping_rounds=10) Is it possible to set a "tol" for early stopping? I.e.: the minimum level of improvement that is required to not trigger early stopping. Tol is a common parameter in SKLearn models, such as MLPClassifier and QuadraticDiscriminantAnalysis. Thank

Can someone explain how these scores are derived in this XGBoost trees?

阅读更多关于 Can someone explain how these scores are derived in this XGBoost trees?

问题 I am looking at the below image. Can someone explain how they are calculated? I though it was -1 for an N and +1 for a yes but then I can't figure out how the little girl has .1. But that doesn't work for tree 2 either. 回答1: The values of leaf elements (aka "scores") - +2 , +0.1 , -1 , +0.9 and -0.9 - were devised by the XGBoost algorithm during training. In this case, the XGBoost model was trained using a dataset where little boys ( +2 ) appear somehow "greater" than little girls ( +0.1 ).

python xgboost continue training on existing model

阅读更多关于 python xgboost continue training on existing model

问题 Lets say I build an xgboost model: bst = xgb.train(param0, dtrain1, num_round, evals=[(dtrain, "training")]) Where: param0 is a set of params to xgb, dtrain1 is a DMatrix ready to be trained num_round is the number of rounds Then, I save the model to disk: bst.save_model("xgbmodel") Later on, I want to reload the model I saved and continue training it with dtrain2 Does anyone have an idea how to do it? 回答1: You don't even have to load the model from the disk and retrain. All you need to do is

understanding python xgboost cv

阅读更多关于 understanding python xgboost cv

问题 I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search cross-validation function? How can I find which of the options for the max_depth parameter ([2,4,6]) was determined optimal? from sklearn.datasets import load_iris import xgboost as xgb iris = load_iris() DTrain = xgb.DMatrix(iris.data, iris.target) x_parameters = {"max_depth":[2,4,6]} xgb.cv(x

XGBoost prediction always returning the same value - why?

阅读更多关于 XGBoost prediction always returning the same value - why?

问题 I'm using SageMaker's built in XGBoost algorithm with the following training and validation sets: https://files.fm/u/pm7n8zcm When running the prediction model that comes out of the training with the above datasets always produces the exact same result. Is there something obvious in the training or validation datasets that could explain this behavior? Here is an example code snippet where I'm setting the Hyperparameters: { {"max_depth", "1000"}, {"eta", "0.001"}, {"min_child_weight", "10"}, {

Mapping the index of the feat importances to the index of columns in a dataframe

阅读更多关于 Mapping the index of the feat importances to the index of columns in a dataframe

问题 Hello I plotted a graph using feature_importance from xgboost. However, the graph returns "f-values". I do not know which feature is being represented in the graph. One way I heard about how to solve this is mapping the index of the features within my dataframe to the index of the feature_importance "f-values" and selecting the columns manually. How do I go about in doing this? Also, if there is another way in doing this, help would truly be appreciated: Here is my code below: feature