xgboost | 易学教程

Xgboost work on pycharm but not in Jupyter NoteBook

阅读更多关于 Xgboost work on pycharm but not in Jupyter NoteBook

问题 I've successfully installed Xgboost in windows with Pycharm Python, and it is working. However, in Jupyter NoteBook, it is not working. import xgboost as xgb ---> 12 import xgboost as xgb ModuleNotFoundError: No module named 'xgboost' In Jupyter the xgboost package is at: > !pip install xgboost Requirement already satisfied: xgboost in c:\users\sifangyou\anaconda3\lib\site-packages\xgboost-0.6-py3.6.egg Requirement already satisfied: numpy in c:\users\sifangyou\anaconda3\lib\site-packages

XGBoost - H2O crashed due to an illegal memory access

阅读更多关于 XGBoost - H2O crashed due to an illegal memory access

问题 H2O process crashed when doing a Grid Search with XGBoost: terminate called after throwing an instance of 'thrust::system::system_error' what(): /tmp/xgboost/plugin/updater_gpu/src/device_helpers.cuh(387): an illegal memory access was encountered After giving the INFO message below: 08-17 06:44:46.672 10.0.1.89:54321 14426 FJ-1-3 INFO: Checking convergence with logloss metric: 0.04519170911104479 --> 0.02811784326194906 (still improving) . 08-17 06:44:46.672 10.0.1.89:54321 14426 FJ-1-3 INFO:

How to install XGBoost on OSX with multi-threading

阅读更多关于 How to install XGBoost on OSX with multi-threading

问题 I'm trying to install xgboost on my mac (osx 10.12.1) following the guide here but I'm running into some issues. Step1 Obtain gcc-6.x.x with openmp support by brew install gcc --without-multilib Terminal Ben$ brew install gcc --without-multilib Error: gcc-5.3.0 already installed To install this version, first `brew unlink gcc` Ben$ brew unlink gcc Unlinking /usr/local/Cellar/gcc/5.3.0... 1288 symlinks removed Ben$ brew install gcc --without-multilib [26 minutes later] ==> Summary 🍺 /usr/local

xgboost xgb.dump tree coefficient

阅读更多关于 xgboost xgb.dump tree coefficient

问题 I have a sample code here. data(agaricus.train, package='xgboost') train <- agaricus.train bst <- xgboost(data = train$data, label = train$label, max.depth = 2, eta = 1, nthread = 2, nround = 2,objective = "binary:logistic") xgb.dump(bst, 'xgb.model.dump', with.stats = TRUE) After building the model, I print it out as booster[0] 0:[f28<-1.00136e-05] yes=1,no=2,missing=1,gain=4000.53,cover=1628.25 1:[f55<-1.00136e-05] yes=3,no=4,missing=3,gain=1158.21,cover=924.5 3:leaf=1.71218,cover=812 4

BAT机器学习面试1000题系列

阅读更多关于 BAT机器学习面试1000题系列

几点声明： 1、本文的内容全部来源于七月在线发布的BAT机器学习面试1000题系列； 2、文章中带斜体的文字代表是本人自己增加的内容，如有错误还请批评指正； 3、原文中有部分链接已经失效，故而本人重新加上了新的链接，如有不当，还请指正。（也已用斜体标出） 4、部分答案由于完全是摘抄自其它的博客，所以本人就只贴出答案链接，这样既可以节省版面，也可以使排版更加美观。点击对应的问题即可跳转。最后，此博文的排版已经经过本人整理，公式已用latex语法表示，方便读者阅读。同时链接形式也做了优化，可直接跳转至相应页面，希望能够帮助读者提高阅读体验，文中如果因为本人的整理出现纰漏，还请指出，大家共同进步！ 1.请简要介绍下SVM。 SVM，全称是support vector machine，中文名叫支持向量机。SVM是一个面向数据的分类算法，它的目标是为确定一个分类超平面，从而将不同的数据分隔开。扩展：支持向量机学习方法包括构建由简至繁的模型：线性可分支持向量机、线性支持向量机及非线性支持向量机。当训练数据线性可分时，通过硬间隔最大化，学习一个线性的分类器，即线性可分支持向量机，又称为硬间隔支持向量机；当训练数据近似线性可分时，通过软间隔最大化，也学习一个线性的分类器，即线性支持向量机，又称为软间隔支持向量机；当训练数据线性不可分时，通过使用核技巧及软间隔最大化，学习非线性支持向量机。

Can someone explain how these scores are derived in this XGBoost trees?

阅读更多关于 Can someone explain how these scores are derived in this XGBoost trees?

I am looking at the below image. Can someone explain how they are calculated? I though it was -1 for an N and +1 for a yes but then I can't figure out how the little girl has .1. But that doesn't work for tree 2 either. The values of leaf elements (aka "scores") - +2 , +0.1 , -1 , +0.9 and -0.9 - were devised by the XGBoost algorithm during training. In this case, the XGBoost model was trained using a dataset where little boys ( +2 ) appear somehow "greater" than little girls ( +0.1 ). If you knew what the response variable was, then you could probably interpret/rationalize those contributions

使用Anaconda3的Docker镜像

阅读更多关于使用Anaconda3的Docker镜像

假设本地 Ubuntu 服务器已经安装好了Docker，这里讲述一下如何开始运行Anaconda3的Docker镜像： 1. 搜索镜像搜索我们想要的anaconda镜像： docker search anaconda 2. 拉取镜像我们决定拉anaconda3官方镜像，即 continuumio/anaconda3 这个镜像： docker pull continuumio/anaconda3 注意，这个镜像大小接近1GB，所以时间比较长。 3.运行镜像，指定网络端口运行 anaconda3 镜像的bash命令行，其中指定容器到宿主机的端口映射： docker run -i -t -p 12345:8888 continuumio/anaconda3 /bin/bash 其中： -i: 是以交互模式运行容器，通常与 -t 同时使用； -t: 为容器重新分配一个伪输入终端，通常与 -i 同时使用； -p: 指定端口映射，格式为：主机(宿主)端口:容器端口具体数字随便写的... 即可进入anaconda3的命令行。 4. 检查Python的版本 python 当前是3.7.3版本 5. 查看已经安装的库有两种查看方法，pip 和 conda 均可 conda list pip list 6. 安装xgboost（或者其他包）首先，原始镜像应该是不带xgboost的

XGBoost CV and best iteration

阅读更多关于 XGBoost CV and best iteration

问题 I am using XGBoost cv to find the optimal number of rounds for my model. I would be very grateful if someone could confirm (or refute), the optimal number of rounds is: estop = 40 res = xgb.cv(params, dvisibletrain, num_boost_round=1000000000, nfold=5, early_stopping_rounds=estop, seed=SEED, stratified=True) best_nrounds = res.shape[0] - estop best_nrounds = int(best_nrounds / 0.8) i.e: the total number of rounds completed is res.shape[0], so to get the optimal number of rounds, we subtract

Parallel threading with xgboost?

阅读更多关于 Parallel threading with xgboost?

问题 According to its documentation, xgboost has an n_jobs parameter. However, when I attempt to set n_jobs, I get this error: TypeError: __init__() got an unexpected keyword argument 'n_jobs' Same issue for some other parameters like random_state. I assumed this might be an update issue, but it seems I have the latest version (0.6a2, installed with pip). There isn't much needed for me to reproduce the error: from xgboost import XGBClassifier estimator_xGBM = XGBClassifier(max_depth = 5, learning

How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?

阅读更多关于 How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?

问题 How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification? So for example, if xgboost predicts a probability of an event is 0.9, how can the confidence in that probability be obtained? Also is this confidence assumed to be heteroskedastic? 回答1: To produce confidence intervals for xgboost model you should train several models (you can use bagging for this). Each model will produce a response for test sample - all responses will form a