xgboost

What is difference between eval_metric and feval in xgboost?

对着背影说爱祢 提交于 2019-12-06 00:59:28
问题 What is difference between feval and eval_metric in xgb.train, both parametrs are only for evaluation purpose. Post from Kaggle gives some insight : https://www.kaggle.com/c/prudential-life-insurance-assessment/forums/t/18473/custom-objective-for-xgboost 回答1: They both do roughly the same thing. Eval_metri c can take a string (uses their internal functions) or user defined function feval only takes a function Both are, as you noted, for evaluation purposes. In the below examples you can see

Why is xgboost not plotting my trees?

梦想的初衷 提交于 2019-12-06 00:13:57
问题 I am running xgboost model as follows: bst <- xgb.train(data=dtrain, booster="gbtree", objective="reg:linear", max.depth=5, nround=20, watchlist=watchlist,min_child_weight=10) importance_matrix <- xgb.importance(names, model = bst) xgb.plot.importance(importance_matrix[1:10,]) Variable-importance matrix is plotted nicely but when I do following xgb.plot.tree(feature_names = names, model = bst, n_first_tree = 2) RStudio opens a new browser window and shows lots of HTML, but no image. The HTML

lightgbm,xgboost,gbdt的区别与联系

一个人想着一个人 提交于 2019-12-05 22:11:47
转载链接 https://www.cnblogs.com/mata123/p/7440774.html GBDT 梯度提升树实在提升树的基础上发展而来的一种使用范围更广的方法,当处理回归问题时,提升树可以看作是梯度提升树的特例(分类问题时是不是特例?)。 因为提升树在构建树每一步的过程中都是去拟合上一步获得模型在训练集上的残差。后面我们将会介绍,这个残存正好是损失函数的梯度,对应于GBDT每一步要拟合的对象。 主要思想 在目标函数所在的函数空间中做梯度下降,即把待求的函数模型当作参数,每一步要拟合目标函数关于上一步获得的模型的梯度,从而使得参数朝着最小化目标函数的方向更新。 一些特性 每次迭代获得的决策树模型都要乘以一个缩减系数,从而降低每棵树的作用,提升可学习空间。 每次迭代拟合的是一阶梯度。 XGBoost XGBoost 是GBDT的一个变种,最大的区别是xgboost通过对目标函数做二阶泰勒展开,从而求出下一步要拟合的树的叶子节点权重(需要先确定树的结构),从而根据损失函数求出每一次分裂节点的损失减小的大小,从而根据分裂损失选择合适的属性进行分裂。 这个利用二阶展开的到的损失函数公式与分裂节点的过程是息息相关的。先遍历所有节点的所有属性进行分裂,假设选择了这个a属性的一个取值作为分裂节点,根据泰勒展开求得的公式可计算该树结构各个叶子节点的权重,从而计算损失减小的程度

how to enforce Monotonic Constraints in XGBoost with ScikitLearn?

折月煮酒 提交于 2019-12-05 21:43:12
I build up a XGBoost model using scikit-learn and I am pretty happy with it. As fine tuning to avoid overfitting, I'd like to ensure monotonicity of some features but there I start facing some difficulties... As far as I understood, there is no documentation in scikit-learn about xgboost (which I confess I am really surprised about - knowing that this situation is lasting for several months). The only documentation I found is directly on http://xgboost.readthedocs.io On this website, I found out that monotonicity can be enforced using "monotone_constraints" option. I tried to use it in Scikit

xgb.plot.tree layout in r

纵饮孤独 提交于 2019-12-05 19:25:01
I was reading a xgb notebook and the xgb.plot.tree command in example result in a pic like this: However when i do the same thing I got a pic like this which are two separate graphs and in different colors too. Is that normal? are the two graphs two trees? Jmi47 I have the same issue. According to an issue case on the xgboost github repository, this could be due to a change in the DiagrammeR library used by xgboost for rendering trees. https://github.com/dmlc/xgboost/issues/2640 Instead of modifying the dgr_graph object with diagrammeR commands, I chose to create a new version of the function

xgboost xgb.dump tree coefficient

别等时光非礼了梦想. 提交于 2019-12-05 16:03:55
I have a sample code here. data(agaricus.train, package='xgboost') train <- agaricus.train bst <- xgboost(data = train$data, label = train$label, max.depth = 2, eta = 1, nthread = 2, nround = 2,objective = "binary:logistic") xgb.dump(bst, 'xgb.model.dump', with.stats = TRUE) After building the model, I print it out as booster[0] 0:[f28<-1.00136e-05] yes=1,no=2,missing=1,gain=4000.53,cover=1628.25 1:[f55<-1.00136e-05] yes=3,no=4,missing=3,gain=1158.21,cover=924.5 3:leaf=1.71218,cover=812 4:leaf=-1.70044,cover=112.5 2:[f108<-1.00136e-05] yes=5,no=6,missing=5,gain=198.174,cover=703.75 5:leaf=-1

How to install XGBoost on OSX with multi-threading

一个人想着一个人 提交于 2019-12-05 15:17:12
I'm trying to install xgboost on my mac (osx 10.12.1) following the guide here but I'm running into some issues. Step1 Obtain gcc-6.x.x with openmp support by brew install gcc --without-multilib Terminal Ben$ brew install gcc --without-multilib Error: gcc-5.3.0 already installed To install this version, first `brew unlink gcc` Ben$ brew unlink gcc Unlinking /usr/local/Cellar/gcc/5.3.0... 1288 symlinks removed Ben$ brew install gcc --without-multilib [26 minutes later] ==> Summary 🍺 /usr/local/Cellar/gcc/6.2.0: 1,358 files, 238.3M, built in 26 minutes 20 seconds Step2 Clone the repository git

原生xgboost中如何输出feature_importance

蓝咒 提交于 2019-12-05 14:55:50
网上教程基本都是清一色的使用sklearn版本,此时的XGBClassifier有自带属性feature_importances_,而特征名称可以通过model._Booster.feature_names获取,但是对应原生版本,也就是通过DMatrix构造,通过model.train训练的模型,如何获取feature_importance?而且,二者获取的feature_importance又有何不同? 1.通过阅读官方文档https://xgboost.readthedocs.io/en/latest/python/python_api.html,发现sklearn版本初始化时会指定一个默认参数 显而易见,最后获取的feature_importances_就是gain得到的 2.而原生版本初始化时没有importance_type参数,真正获取feature_importance时通过model.get_score(importance_type="gain")获取,(另外一个方法get_fscore()就是get_score(importance_type="weight"),二者实现一样。) 注意这里默认参数是"weight",就是指每个特征被用于分割的使用次数。如果对标skelearn版本需要指定“gain”,这里gain是指平均增益,另外

Python xgboost: kernel died

我们两清 提交于 2019-12-05 13:44:06
问题 My Jupyter notebook's python kernel keeps dying. I have run all of the following code successfully before. Presently, there are issues. First, I will show you the code chunk that I am able to run successfully: import xgboost as xgb xgtrain = xgb.DMatrix(data = X_train_sub.values, label = Y_train.values) # create dense matrix of training values xgtest = xgb.DMatrix(data = X_test_sub.values, label = Y_test.values) # create dense matrix of test values param = {'max_depth':2, 'eta':1, 'silent':1,

How is the gradient and hessian of logarithmic loss computed in the custom objective function example script in xgboost's github repository?

北慕城南 提交于 2019-12-05 07:53:45
I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script . I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the script. Here is a simplified example: import numpy as np def loglikelihoodloss(y_hat, y_true): prob = 1.0 / (1.0 + np.exp(-y_hat)) grad = prob - y_true hess = prob * (1.0 - prob) return grad, hess y_hat = np.array([1.80087972, -1.82414818, -1.82414818, 1.80087972, -2.08465433, -1.82414818, -1.82414818, 1.80087972, -1.82414818, -1.82414818]) y_true