xgboost

Custom Evaluation Function based on F1 for use in xgboost - Python API

偶尔善良 提交于 2019-12-18 07:23:01
问题 I have written the following custom evaluation function to use with xgboost, in order to optimize F1. Umfortuantely it returns an exception when run with xgboost. The evaluation function is the following: def F1_eval(preds, labels): t = np.arange(0, 1, 0.005) f = np.repeat(0, 200) Results = np.vstack([t, f]).T P = sum(labels == 1) for i in range(200): m = (preds >= Results[i, 0]) TP = sum(labels[m] == 1) FP = sum(labels[m] == 0) if (FP + TP) > 0: Precision = TP/(FP + TP) Recall = TP/P if

xgboost installation issue with anaconda

纵饮孤独 提交于 2019-12-18 02:41:54
问题 I am using Anaconda. I first switched to Python2 (Version 2.7.11). python -V Python 2.7.11 :: Continuum Analytics, Inc. I used the following command to install xgboost in anaconda. conda install -c https://conda.anaconda.org/akode xgboost I then checked that xgboost is installed. conda list xgboost 0.3 py27_0 akode I run python in terminal, import xgboost and got the following errors. import xgboost as xgb Traceback (most recent call last): File "<stdin>", line 1, in <module> File "//anaconda

xgboost installation issue with anaconda

半世苍凉 提交于 2019-12-18 02:41:50
问题 I am using Anaconda. I first switched to Python2 (Version 2.7.11). python -V Python 2.7.11 :: Continuum Analytics, Inc. I used the following command to install xgboost in anaconda. conda install -c https://conda.anaconda.org/akode xgboost I then checked that xgboost is installed. conda list xgboost 0.3 py27_0 akode I run python in terminal, import xgboost and got the following errors. import xgboost as xgb Traceback (most recent call last): File "<stdin>", line 1, in <module> File "//anaconda

xgboost原理及应用--转

坚强是说给别人听的谎言 提交于 2019-12-17 04:24:47
1.背景 关于xgboost的原理网络上的资源很少,大多数还停留在应用层面,本文通过学习陈天奇博士的PPT 地址 和xgboost导读和实战 地址 ,希望对xgboost原理进行深入理解。 2.xgboost vs gbdt 说到xgboost,不得不说gbdt。了解gbdt可以看我这篇文章 地址 ,gbdt无论在理论推导还是在应用场景实践都是相当完美的,但有一个问题:第n颗树训练时,需要用到第n-1颗树的(近似)残差。从这个角度来看,gbdt比较难以实现分布式(ps:虽然难,依然是可以的,换个角度思考就行),而xgboost从下面这个角度着手 注:红色箭头指向的l即为损失函数;红色方框为正则项,包括L1、L2;红色圆圈为常数项。 利用泰勒展开三项,做一个近似,我们可以很清晰地看到,最终的目标函数只依赖于每个数据点的在误差函数上的一阶导数和二阶导数。 3.原理 (1)定义树的复杂度 对于f的定义做一下细化,把树拆分成 结构部分q 和 叶子权重部分w 。下图是一个具体的例子。结构函数q把输入映射到叶子的索引号上面去,而w给定了每个索引号对应的叶子分数是什么。 定义这个复杂度包含了一棵树里面节点的个数,以及每个树叶子节点上面输出分数的L2模平方。当然这不是唯一的一种定义方式,不过这一定义方式学习出的树效果一般都比较不错。下图还给出了复杂度计算的一个例子。 注

How to install xgboost package in python (windows platform)?

孤街浪徒 提交于 2019-12-17 04:16:53
问题 http://xgboost.readthedocs.org/en/latest/python/python_intro.html On the homepage of xgboost(above link), it says: To install XGBoost, do the following steps: You need to run make in the root directory of the project In the python-package directory run python setup.py install However, when I did it, for step 1 the following error appear: make : The term 'make' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path

ADAboost 和GBDT和XGboost

拥有回忆 提交于 2019-12-16 01:54:16
(一)Adaboost算法的模型是一个弱学习器线性组合,特点是通过迭代,每一轮学习一个弱学习器,在每次迭代中,提高那些被前一轮分类器错误分类的数据的权值,降低正确分类的数据的权值。最后,将弱分类器的线性组合作为强分类器,给分类误差小的基本分类器大的权值。每一次迭代都可以减少在训练集上的分类误差率。 当然,就如每一个算法都有自己的优缺点一样,AdaBoost 也有自身的缺点。AdaBoost 算法只直接支持二分类,遇到多分类的情况,需要借助 one-versus-rest 的思想来训练多分类模型。关于 one-verus-rest 的细节可以参考本系列第一篇文章 SVM。 AdaBoost能够有效的降低偏差,能够在泛化性能非常弱的学习器上构建成很强的集成。缺点是对噪声敏感。 AdaBoost 的核心就是不断迭代训练弱分类器,并计算弱分类器的权重。需要注意的是,弱分类器的训练依赖于样本权重。每一轮迭代的样本权重都不相同,依赖于弱分类器的权重值和上一轮迭代的样本权重。( 相当于改变了数据的分布情况 ,分类器学习和改变数据分布) 用 AdaBoost 也可以实现回归模型,需要将弱分类器替换成回归树,并将平方误差作为损失函数 GBDT与Adaboost的主要差别为,Adaboost每轮学习的一个基本学习器是通过改变样本的权值,关注上轮分类错误的样本的权值,以逐步减少在训练集上的分类误差率

Feature importance 'gain' in XGBoost

落爺英雄遲暮 提交于 2019-12-14 03:56:23
问题 I want to understand how the feature importance in xgboost is calculated by 'gain'. From https://towardsdatascience.com/be-careful-when-interpreting-your-features-importance-in-xgboost-6e16132588e7: ‘Gain’ is the improvement in accuracy brought by a feature to the branches it is on. The idea is that before adding a new split on a feature X to the branch there was some wrongly classified elements, after adding the split on this feature, there are two new branches, and each of these branch is

XGBoost installation issues for Python Anaconda Windows 10 (18 May 2018)

为君一笑 提交于 2019-12-14 03:45:50
问题 Over the past several days I have tried to install XGBoost using instructions found at http://xgboost.readthedocs.io/en/latest/build.html XGBoost Installation in windows https://github.com/dmlc/xgboost/tree/master/python-package https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_For_Anaconda_on_Windows?lang=en https://anaconda.org/conda-forge/xgboost http://www.picnet.com.au/blogs/guido/2016/09/22/xgboost-windows-x64-binaries-for-download/. Some of the

xgboost pckage for python 3.6

你。 提交于 2019-12-14 03:29:01
问题 I am trying in install xgboost0.72 in window and python 3.6.5 It shows me the following error: xgboost-0.72-cp37-cp37m-win32.whl is not a supported wheel on this platform. Can anyone help me out which version of xgboost is compatible with python 3.6.5. Thanks zep 回答1: Try doing: pip install xgboost It works for me. Or try this: download xgboost whl file from [here][1] (make sure to match your python version and system architecture, e.g. "xgboost-0.6-cp35-cp35m-win_amd64.whl" for python 3.5 on

Why am I getting a “ValueError: feature_names mismatch” when specifying the feature-name list in XGBoost for visualization?

孤人 提交于 2019-12-14 02:11:45
问题 When I mention the feature names while defining the data matrix in an internal data structure used by XGBoost, I get this error: d_train = xgboost.DMatrix(X_train, label=y_train, feature_names=list(X)) d_test = xgboost.DMatrix(X_test, label=y_test, feature_names=list(X)) ... ... ... shap_values = shap.TreeExplainer(model).shap_values(X_train) shap.summary_plot(shap_values, X_train) ValueError Traceback (most recent call last) <ipython-input-59-4635c450279d> in <module>() ----> 1 shap_values =