xgboost

面试——XGBoost

荒凉一梦 提交于 2020-01-25 21:52:48
文章目录 简单介绍一下XGBoost XGBoost与GBDT有什么不同 XGBoost为什么使用泰勒二阶展开 XGBoost为什么可以并行训练 XGBoost为什么快 XGBoost防止过拟合的方法 XGBoost如何处理缺失值 XGBoost中的一棵树的停止生长条件 RF和GBDT的区别 XGBoost如何处理不平衡数据 比较LR和GBDT,说说什么情景下GBDT不如LR XGBoost中如何对树进行剪枝 XGBoost如何选择最佳分裂点? XGBoost如何评价特征的重要性 XGBooost参数调优的一般步骤 XGBoost模型如果过拟合了怎么解决 为什么XGBoost相比某些模型对缺失值不敏感 XGBoost和LightGBM的区别 简单介绍一下XGBoost 首先需要说一说GBDT,它是一种基于boosting增强策略的加法模型,训练的时候采用前向分布算法进行贪婪的学习,每次迭代都学习一棵CART树来拟合之前 t-1 棵树的预测结果与训练样本真实值的残差。 XGBoost对GBDT进行了一系列优化,比如损失函数进行了二阶泰勒展开、目标函数加入正则项、支持并行和默认缺失值处理等,在可扩展性和训练速度上有了巨大的提升,但其核心思想没有大的变化 XGBoost与GBDT有什么不同 基分类器:XGBoost的基分类器不仅支持CART决策树,还支持线性分类器

GBDT与XGBOOST

情到浓时终转凉″ 提交于 2020-01-25 19:56:59
Boosting方法实际上是采用加法模型与前向分布算法。在上一篇提到的Adaboost算法也可以用加法模型和前向分布算法来表示。以决策树为基学习器的提升方法称为提升树(Boosting Tree)。对分类问题决策树是CART分类树,对回归问题决策树是CART回归树。 1、前向分布算法   引入加法模型      在给定了训练数据和损失函数L(y,f(x))L(y,f(x)) 的条件下,可以通过损失函数最小化来学习加法模型      然而对于这个问题是个很复杂的优化问题,而且要训练的参数非常的多,前向分布算法的提出就是为了解决模型的优化问题,其核心思想是因为加法模型是由多各模型相加在一起的,而且在Boosting中模型之间又是有先后顺序的,因此可以在执行每一步加法的时候对模型进行优化,那么每一步只需要学习一个模型和一个参数,通过这种方式来逐步逼近全局最优,每一步优化的损失函数:      具体算法流程如下:   1)初始化f0(x)=0f0(x)=0;   2)第m次迭代时,极小化损失函数      3)更新模型,则$f_m (x)$:      4)得到最终的加法模型       Adaboost算法也可以用前向分布算法来描述,在这里输入的数据集是带有权重分布的数据集,损失函数是指数损失函数。 2、GBDT算法   GBDT是梯度提升决策树(Gradient Boosting

python xgboost on mac install

会有一股神秘感。 提交于 2020-01-23 06:08:16
问题 I am trying to install xgboost on my Mac for Python 3.4 but I'm getting the following error after "pip3 setup.py install": File "<string>", line 20, in <module> File "/private/var/folders/_x/rkkz7tjj42g9n8lqq5r0ry000000gn/T/pip-build-2dc6bwf7/xgboost/setup.py", line 28, in <module> execfile(libpath_py, libpath, libpath) NameError: name 'execfile' is not defined When running it with the -v option to get the verbose output the error looks like this: Command "python setup.py egg_info" failed

XGBoost crashing kernel in jupyter notebook

∥☆過路亽.° 提交于 2020-01-23 03:07:13
问题 I don't know how to make the XGBoost classifier work. I am running the code below on jupyter notebook, and it always generates this message "The kernel appears to have died. It will restart automatically." from xgboost import XGBClassifier model = XGBClassifier() model.fit(X, y) There is no problem with importing the XGBClassifier, but it crashes upon fitting it to my data. X is a 502 by 33 all-numeric dataframe, y is the set of 0 or 1 labels for each row. Does anyone know what could be the

Which is the loss function for multi-class classification in XGBoost?

二次信任 提交于 2020-01-22 19:52:06
问题 I'm trying to know which loss function uses XGBoost for multi-class classification. I found in this question the loss function for logistic classification in the binary case. I had though that for the multi-class case it might be the same as in GBM (for K classes) which can be seen here, where y_k=1 if x's label is k and 0 in any other case, and p_k(x) is the softmax function. However, I have made the first and second order gradient using this loss function and the hessian doesn't match the

Which is the loss function for multi-class classification in XGBoost?

こ雲淡風輕ζ 提交于 2020-01-22 19:52:05
问题 I'm trying to know which loss function uses XGBoost for multi-class classification. I found in this question the loss function for logistic classification in the binary case. I had though that for the multi-class case it might be the same as in GBM (for K classes) which can be seen here, where y_k=1 if x's label is k and 0 in any other case, and p_k(x) is the softmax function. However, I have made the first and second order gradient using this loss function and the hessian doesn't match the

anaconda安装xgboost遇到的一些细节问题

妖精的绣舞 提交于 2020-01-22 00:29:28
如果你直接在anaconda prompt用pip install、conda install 能安装,那么恭喜你。我在安装这个包时运气不好,只能自己下载来安装,结果因为细节问题,浪费了一些安装时间,特意把这些问题记下来。 一、下载包的地址 https://www.lfd.uci.edu/~gohlke/pythonlibs/#xgboost 直接在上面这个网址上下载对应的包即可,如下图,“cp34”、“cp35”这些代表的是你的python版本,根据你自己安装的版本选择即可,“win amd64”代表64位操作系统,同理,“win32”代表32位系统,根据这两个条件选择你需要的包下载即可。 二、下载完放哪里? 下载完,一般是放在你安装python对应的Scripts文件夹下,如果忘记你这个文件在哪里,可以右键桌面“计算机”--属性--高级系统设置--环境变量--用户变量--path下就有你当时安装anaconda时配置的路径。 文件按上述位置放好,打开cmd窗口,输入:pip install 上一步下载的文件名(例如:pip install xgboost-0.90-cp36-cp36m-win_amd64.whl) 三、安装过程中可能会遇到哪些问题? 1、提示文件不存在(but the file does not exist) 这个问题是你下载的包放置的位置不对导致的

集成算法之GBDT和xgboost

大兔子大兔子 提交于 2020-01-18 01:47:57
大家知道,我们在进行建模时,会求解一个目标函数;目标函数又称代价函数,在机器学习中普遍存在,一般形式为: o b j ( θ ) = L ( θ ) + Ω ( θ ) obj(\theta)=L(\theta)+\Omega(\theta) o b j ( θ ) = L ( θ ) + Ω ( θ ) ; 其中: L ( θ ) L(\theta) L ( θ ) 为训练误差,衡量模型在训练集上的表现; Ω ( θ ) \Omega(\theta) Ω ( θ ) 是正则化惩罚,衡量模型的复杂度。 训练集误差: L = ∑ i = 1 n l ( y i , y i ^ ) L=\sum_{i=1}^{n}l(y_i,\hat{y_i}) L = ∑ i = 1 n ​ l ( y i ​ , y i ​ ^ ​ ) square loss: l ( y i , y i ^ ) = ( y i − y i ^ ) 2 l(y_i,\hat{y_i})=(y_i-\hat{y_i})^2 l ( y i ​ , y i ​ ^ ​ ) = ( y i ​ − y i ​ ^ ​ ) 2 logistic loss: l ( y i , y i ^ ) = y i l n ( 1 + e − y i ^ ) + ( 1 − y i ) l n ( 1 + e y i ^ ) l(y

How do I free all memory on GPU in XGBoost?

旧城冷巷雨未停 提交于 2020-01-14 22:42:41
问题 Here is my code: clf = xgb.XGBClassifier( tree_method = 'gpu_hist', gpu_id = 0, n_gpus = 4, random_state = 55, n_jobs = -1 ) clf.set_params(**params) clf.fit(X_train, y_train, **fit_params) I've read the answers on this question and this git issue but neither worked. I tried to delete the booster in this way: clf._Booster.__del__() gc.collect() It deletes the booster but doesn't completely free up GPU memory. I guess it's Dmatrix that is still there but I am not sure. How can I free the whole

Need help installing a specific python package using pip

岁酱吖の 提交于 2020-01-14 10:07:56
问题 I have seen related questions to my question, but those answers didn't work for me. I am trying to install xgboost package, but I got this error: *No files/directories in C:\Users\Fatemeh\AppData\Local\Temp\pip-build-57cpr7io\xgboost\pip-egg-info (from PKG-INFO)* I have tried almost all the options such as --no-cache-dir , --no-clean but got the same error. I would appreciate it if you can help me fix this. I tried installing from Github and tried other methods (using cmd and setup.py scripts