xgboost

对xgboost和lightgbm的理解及其调参应该关注的点

戏子无情 提交于 2019-12-27 18:49:27
这两个算法都是集成学习了分类回归树模型,先讨论是怎么集成的。 集成的方法是 Gradient Boosting 比如我要拟合一个数据如下: 第一次建了一个模型如上图中的折线,效果不是很理想,然后要新建一个模型来综合一下结果,那么第二个模型如何建,我们将实际目标值和我们第一个模型的预测的差值 作为第二次模型的目标值如下图再建一个模型: 然后不断地新建新的模型,过程如下: 最后就能集成这些模型不断提升预测的精度。 步骤如下: 损失函数: 最小化目标: 对每一个基学习器求偏导: 这两个算法的基学习器都是分类回归树,也就是先分类然后回归的树,这里采用决策树来做特征分类。 建立决策树要考虑主要的问题是,我们如何找到合适的特征和合适的切分点来做数据集划分。判断标准是什么。 可以采用遍历的方式,遍历每一个特征的每一个数据,注意为了能够快速划分数据集,在分某一个特征的时候,就根据这个特征进行排序,这样切割数据集的时候就不要每一个数据重新进行和切分点的阈值进行比较。 划分依据是分类之后的数据集的目标值的方差和要下降的最多(对于目标值是连续值)。 假设下面代码中最后一列为目标值: def err_cnt(dataSet): '''回归树的划分指标 input: dataSet(list):训练数据 output: m*s^2(float):总方差 ''' data = np.mat(dataSet)

Ubuntu16.04.5 配置英伟达NVIDIA 显卡 驱动实现GPU加速

此生再无相见时 提交于 2019-12-27 18:19:29
Ubuntu16.04.5 配置英伟达NVIDIA 显卡 驱动实现GPU加速 标签(空格分隔): 运维系列 一:系统环境初始化与系统包准备 二:安装测试步骤 一:系统环境初始化与系统包准备 apt-get update apt-get install vim openssh-server 准备系统所需要的安装包 NVIDIA-Linux-x86_64-440.44.run cuda_10.2.89_440.33.01_linux.run 二:安装测试步骤 1.1 安装Nvidia显卡驱动 1. 到官网上下载自己GPU对应版本的显卡驱动。 下载地址:https://www.nvidia.cn/Download/index.aspx?lang=cn 选择你的显卡驱动版本 点击搜索下载即可 1.2 安装NVIDIA-Linux-x86_64-440.44.run 屏蔽自带的显卡驱动 1) vim /etc/modprobe.d/blacklist.conf 2) 在最后一行加上:blacklist nouveau ,这里是将Ubuntu自带的显卡驱动加入黑名单 3) 在终端输入:update-initramfs –u,使修改生效 4 ) 从新启动系统: reboot  5)打开终端输入lsmod | grep nouveau,没有输出,则屏蔽成功 6 ) service lightdm

LightGBM and XGBoost Explained

自闭症网瘾萝莉.ら 提交于 2019-12-26 17:15:55
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> The gradient boosting decision tree (GBDT) is one of the best performing classes of algorithms in machine learning competitions. One implementation of the gradient boosting decision tree – xgboost – is one of the most popular algorithms on Kaggle. Among the 29 challenge winning solutions published at Kaggle’s blog during 2015, 17 used xgboost. If you take a look at the kernels in a Kaggle competition, you can clearly see how popular xgboost is. The search results for all kernels that had xgboost in their titles for the Kaggle Quora Duplicate Question

AnyQ搭建与编译问题解决

浪尽此生 提交于 2019-12-26 04:33:11
AnyQ 搭建过程与编译问题解决 写在最前:之前工程上使用过anyq,搭建过程尤其不顺利,现在之前的方法和经验重新搭建,发现竟然搭建不起来,所以按照官方的文档,重新的尝试,总结出几点常见的问题和解决方法。 搭建过程参考我之前的博客 搭建AnyQ方法 基本的步骤和过程没有问题,但是安装现有的docker 和 github,是编译不成功的。 常见问题 1. docker run docker run -dit -p 0.0.0.0:9999:8999 paddlepaddle/paddle:latest-dev 修改成 docker run -dit -p 9999:8999 paddlepaddle/paddle:latest-dev /bin/bash 原方法会出现一run 不起 docker docker ps 时什么都没有 docker ps -a 时发现,容器退出了。 2. make 报错 [ 4%] Built target extern_leveldb [ 9%] Built target extern_jsoncpp [ 13%] Built target extern_gtest [ 18%] Built target extern_xgboost [ 22%] Built target extern_eigen [ 26%] Built target extern

Is it possible to cross-validate and save the cross-validated model with xgboost (xgb.cv) in R?

梦想的初衷 提交于 2019-12-25 17:13:07
问题 Almost all of the machine learning packages / functions in R allow you to obtain cross-validation performance metrics while training a model. From what I can tell, the only way to do cross-validation with xgboost is to setup a xgb.cv statement like this: clf <- xgb.cv( params = param, data = dtrain, nrounds = 1000, verbose = 1, watchlist = watchlist, maximize = FALSE, nfold = 2, nthread = 2, prediction = T ) but even with that option of prediction = T you're merely getting the prediction

Is it possible to cross-validate and save the cross-validated model with xgboost (xgb.cv) in R?

人盡茶涼 提交于 2019-12-25 17:13:06
问题 Almost all of the machine learning packages / functions in R allow you to obtain cross-validation performance metrics while training a model. From what I can tell, the only way to do cross-validation with xgboost is to setup a xgb.cv statement like this: clf <- xgb.cv( params = param, data = dtrain, nrounds = 1000, verbose = 1, watchlist = watchlist, maximize = FALSE, nfold = 2, nthread = 2, prediction = T ) but even with that option of prediction = T you're merely getting the prediction

Reading XGBoost Model in C++

試著忘記壹切 提交于 2019-12-25 07:15:05
问题 I trained my model in R using XGBoost and now need to do predictions in C++. I am trying to load the model file in C++ using XGBoosterLoadModel function. My code compiles fine but it fails at discovering my unit-test functions. When I remove the call to function XGBoosterLoadModel , everything works fine and I can see my unit tests. Here's what I have in my unit test file. Any clue on what I'm missing would be really appreciated: #include <xgboost/c_api.h> #include "stdafx.h" #include <google

stop xgboost based on eval_metric

房东的猫 提交于 2019-12-25 06:55:23
问题 I am trying to run xgboost for a problem with very noisy features and interested in stopping the number of rounds based on a custom eval_metric that I have defined. Based on domain knowledge I know that when the eval_metric (evaluated on the training data) goes above a certain value xgboost is overfitting. And I would like to just take the fitted model at that specific number of rounds and not proceed further. What would be the best way to achieve this ? It would be somewhat in line with the

Use predicted probability of one model to train another model and save as one single model

∥☆過路亽.° 提交于 2019-12-25 02:29:01
问题 I have a XGBoost model that I am using for some binary classification purpose. It makes use of some features namely f1, f2, f3, f4, f5, f6, f7 I want to make use of another LogisticRegression model from sklearn that makes use of the output of the model and a feature of XGBoost model to make prediction ie it must take f1, out to make the prediction. Where out is the prediction made by the XGBoost model. I want to save these two model into a single file some how to make prediction in production

Unexpected behavior from xgboost in Python with Custom Evaluation Function

混江龙づ霸主 提交于 2019-12-24 10:47:54
问题 I am using xgboost with a Custom Evaluation Function and I would like to implement Early Stopping setting a limit of 150 rounds. I am getting back 4 evaluation metrics than the expected 2 and I do not know how to interpret them. Moreover I am not sure how to activate early stopping setting a limit as well (e.g., 150 rounds). For a reproducible example: import numpy as np def F1_eval_gen(preds, labels): t = np.arange(0, 1, 0.005) f = np.repeat(0, 200) results = np.vstack([t, f]).T # assuming