xgboost | 易学教程

xgboost学习与总结

阅读更多关于 xgboost学习与总结

最近在研究xgboost，把一些xgboost的知识总结一下。这里只是把相关资源作总结，原创的东西不多。原理 xgboost的原理首先看xgboost的作者陈天奇的 ppt 英文不太好的同学可以看看这篇博客 xgboost原理。假如看了陈天奇的ppt还晕乎的同学，看了这篇应该能大概知道xgboost是如何求最优解的。实战 xgboost的参数多的简直不像话。上面提到的博客里 xgboost原理提供了3篇介绍调参思路的博客。其中作者推荐的老外的那篇有个翻译好的中文博客 XGBoost参数调优完全指南（附Python代码）。我在这里强烈推荐。附上python的 api 地址。常见问题机器学习算法中GBDT和XGBOOST的区别有哪些？这个问题的答案来自知乎机器学习算法中GBDT和XGBOOST的区别有哪些的wepon的回答。传统GBDT以CART作为基分类器，xgboost还支持线性分类器，这个时候xgboost相当于带L1和L2正则化项的逻辑斯蒂回归（分类问题）或者线性回归（回归问题）。传统GBDT在优化时只用到一阶导数信息，xgboost则对代价函数进行了二阶泰勒展开，同时用到了一阶和二阶导数。顺便提一下，xgboost工具支持自定义代价函数，只要函数可一阶和二阶求导。 xgboost在代价函数里加入了正则项，用于控制模型的复杂度

How to change size of plot in xgboost.plot_importance?

阅读更多关于 How to change size of plot in xgboost.plot_importance?

问题 xgboost.plot_importance(model, importance_type='gain') I am not able to change size of this plot. I want to save this figure with proper size so that I can use it in pdf. I want similar like figize 回答1: It looks like plot_importance return an Axes object ax = xgboost.plot_importance(...) fig = ax.figure fig.set_size_inches(h, w) It also looks like you can pass an axes in fig, ax = plt.subplots(figsize=(h, w)) xgboost.plot_importance(..., ax=ax) 来源： https://stackoverflow.com/questions/40664776

ld: library not found for -lomp

阅读更多关于 ld: library not found for -lomp

问题 In macOS Sierra, installation for xgboost with openmp enabled always fails. From https://xgboost.readthedocs.io/en/latest/build.html , I've tried: cp make/config.mk ./config.mk; make -j4 With, export CC=/usr/local/Cellar/llvm/4.0.0_1/bin/clang export CXX=/usr/local/Cellar/llvm/4.0.0_1/bin/clang++ export CXX1X=/usr/local/Cellar/llvm/4.0.0_1/bin/clang++ It fails with, clang-4.0clang-4.0: : warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]warning: argument

XGBoost - Poisson distribution with varying exposure / offset

阅读更多关于 XGBoost - Poisson distribution with varying exposure / offset

问题 I am trying to use XGBoost to model claims frequency of data generated from unequal length exposure periods, but have been unable to get the model to treat the exposure correctly. I would normally do this by setting log(exposure) as an offset - are you able to do this in XGBoost? (A similar question was posted here: xgboost, offset exposure?) To illustrate the issue, the R code below generates some data with the fields: x1, x2 - factors (either 0 or 1) exposure - length of policy period on

ValueError: feature_names mismatch: in xgboost in the predict() function

阅读更多关于 ValueError: feature_names mismatch: in xgboost in the predict() function

问题 I have trained an XGBoostRegressor model. When I have to use this trained model for predicting for a new input, the predict() function throws a feature_names mismatch error, although the input feature vector has the same structure as the training data. Also, in order to build the feature vector in the same structure as the training data, I am doing a lot inefficient processing such as adding new empty columns (if data does not exist) and then rearranging the data columns so that it matches

How to get feature importance in xgboost?

阅读更多关于 How to get feature importance in xgboost?

问题 I'm using xgboost to build a model, and try to find the importance of each feature using get_fscore() , but it returns {} and my train code is: dtrain = xgb.DMatrix(X, label=Y) watchlist = [(dtrain, 'train')] param = {'max_depth': 6, 'learning_rate': 0.03} num_round = 200 bst = xgb.train(param, dtrain, num_round, watchlist) So is there any mistake in my train? How to get feature importance in xgboost? 回答1: In your code you can get feature importance for each feature in dict form: bst.get

GridSearchCV - XGBoost - Early Stopping

阅读更多关于 GridSearchCV - XGBoost - Early Stopping

问题 i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task. I am using XGBoost via its Scikit-Learn API. model = xgb.XGBRegressor() GridSearchCV(model, paramGrid, verbose=verbose ,fit_params={'early_stopping_rounds':42}, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]), n_jobs=n_jobs, iid=iid)

Using XGBOOST in c++

阅读更多关于 Using XGBOOST in c++

问题 How can I use XGBOOST https://github.com/dmlc/xgboost/ library in c++? I have founded Python and Java API, but I can't found API for c++ 回答1: I ended up using the C API, see below an example: // create the train data int cols=3,rows=5; float train[rows][cols]; for (int i=0;i<rows;i++) for (int j=0;j<cols;j++) train[i][j] = (i+1) * (j+1); float train_labels[rows]; for (int i=0;i<rows;i++) train_labels[i] = 1+i*i*i; // convert to DMatrix DMatrixHandle h_train[1]; XGDMatrixCreateFromMat((float *

error in running exe file having xgboost package by using pyinstaller

阅读更多关于 error in running exe file having xgboost package by using pyinstaller

问题 I have a code for predicting some value that uses xgboost package in the code. When I run it in PyCharm, it runs as expected. The problem is when I make an executable file using pyinstaller . It will make the exe without any error, but when I run it the following error is raised: `Traceback (most recent call last): File "test_fraud.py", line 3, in <module> import xgboost File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find

AttributeError: module ‘xgboost’ has no attribute ‘XGBRegressor’

阅读更多关于 AttributeError: module ‘xgboost’ has no attribute ‘XGBRegressor’

问题 I am trying to run xgboost using spyder and python, but I keep getting this error: AttributeError: module ‘xgboost’ has no attribute ‘XGBRegressor’ Here is the code: import xgboost as xgb xgb.XGBRegressor(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='reg:linear', gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, seed=0, missing=None) Error is Traceback (most recent call last): File "<ipython-input-33-d257a9a2a5d8>", line 1, in <module>