xgboost

XGBoost predictor in R predicts the same value for all rows [duplicate]

喜夏-厌秋 提交于 2019-12-24 09:47:44
问题 This question already has answers here : xgboost predict method returns the same predicted value for all rows (5 answers) Closed last year . I looked into the the post on the same thing in Python, but I want a solution in R. I'm working on the Titanic dataset from Kaggle, and it looks like this: 'data.frame': 891 obs. of 13 variables: $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ... $ Survived : num 0 1 1 1 0 0 0 0 1 1 ... $ Pclass : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ... $ Age :

Xgboost 得调参思路

一笑奈何 提交于 2019-12-24 09:10:36
文章目录 xgboost的优点 参数调试 通用参数 Booster 参数 目标参数 大致步骤 xgboost的优点 1、正则化 GBM(Gradient Boosting Machine)的实现没有像XGBoost这样的正则化步骤,因此很多时候过拟合处理比较麻烦,而XGBoost以“正则化提升(regularized boosting)”技术而闻名。 2、并行处理 XGBoost支持并行处理,相比GBM有了速度上的巨大提升。 注:Boosting还是串行的,并行的表现在于数据预处理后保存在block中,避免每次调用都进行一次预处理 3、兼容性强 可以处理底层的numpy和scipy数据,特别对于部分大数据集可以处理稀疏矩阵直接进行训练。 4、内置交叉验证 XGBoost允许在每一轮Boosting迭代中使用交叉验证。因此,可以方便地获得最优Boosting迭代次数。GBM的网格搜索有个最大弊端,只能在用户给出的范围内进行寻值。 5、灵活性强 (1)允许用户定义自定义优化目标和评价标准,它对模型的使用开辟了一个全新的维度,用户的处理不会受到任何限制。 (2)可以自动处理缺失值,避免了太过繁琐的预处理,用户需要提供一个和其它样本不同的值,然后把它作为一个参数传进去,以此来作为缺失值的取值。并且XGBoost在不同节点遇到缺失值时采用不同的处理方法,而且会学习未来遇到缺失值时的处理方法。

XGBoost can't find sklearn

萝らか妹 提交于 2019-12-24 07:39:47
问题 I’m experimenting with XGBoost and am blocked by an error I can’t figure out. I have sklearn installed in the active environment and can verify it by training a sklearn RandomForestClassifier in the same notebook . When I try to train a XGBoost model I get the error XGBoostError: sklearn needs to be installed in order to use this module This works: clf = RandomForestClassifier(n_estimators=200, random_state=0, n_jobs=-1) This throws the exception: clf = xgb.XGBClassifier(max_depth=3, n

Jupyter notebook xgboost import

痞子三分冷 提交于 2019-12-23 07:52:08
问题 I have the problem below (I'm on a MAC) I can import xgboost from python2.7 or python3.6 with my Terminal but the thing is that I can not import it on my Jupyter notebook. import xgboost as xgb ModuleNotFoundError Traceback (most recent call last) in () ----> 1 import xgboost as xgb ModuleNotFoundError: No module named 'xgboost' Although I write : !pip3 install xgboost It prints that : Requirement already satisfied: xgboost in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6

Difference is value between xgb.train and xgb.XGBRegressor in Python for certain cases

 ̄綄美尐妖づ 提交于 2019-12-23 01:53:26
问题 I noticed that there are two possible implementations of XGBoost in Python as discussed here and here When I tried running the same dataset through the two possible implementations I noticed that the results were different. Code import xgboost as xgb from xgboost.sklearn import XGBRegressor import xgboost import pandas as pd import numpy as np from sklearn import datasets boston_data = datasets.load_boston() df = pd.DataFrame(boston_data.data,columns=boston_data.feature_names) df['target'] =

GPU support for XGBoost and LightGBM

白昼怎懂夜的黑 提交于 2019-12-23 00:27:17
GPU support for XGBoost and LightGBM GBDT 是表格型数据挖掘比赛的大杀器,其主要思想是利用弱分类器(决策树)迭代训练以得到最优模型,该模型具有训练效果好、不易过拟合等优点。XGBoost 和 LightGBM 是两个实现 GBDT 算法的框架,为了加快模型的训练效率,本文记录了 GPU Support 的 XGBoost and LightGBM 的构建过程。 本次构建的系统环境为 CentOS 7.2。 Installation Guide for XGBoost GPU support Building XGBoost from source,构建和安装 XGBoost 包括如下两个步骤, 从 C++ 代码构建共享库(libxgboost.so for Linux/OSX and xgboost.dll for Windows) 安装语言包(如Python) Building the Shared Library 在 CentOS 上构建共享库,默认情况下,分布式 GPU 训练是关闭的,仅仅只有一个 GPU 将被使用,为开启分布式 GPU 训练,用 CMake 构建时,设置选项 USE_NCLL=ON ,分布式 GPU 训练依赖 NCLL2 ,可在 https://developer.nvidia.com/nccl 获取

how to enforce Monotonic Constraints in XGBoost with ScikitLearn?

时光总嘲笑我的痴心妄想 提交于 2019-12-22 10:55:55
问题 I build up a XGBoost model using scikit-learn and I am pretty happy with it. As fine tuning to avoid overfitting, I'd like to ensure monotonicity of some features but there I start facing some difficulties... As far as I understood, there is no documentation in scikit-learn about xgboost (which I confess I am really surprised about - knowing that this situation is lasting for several months). The only documentation I found is directly on http://xgboost.readthedocs.io On this website, I found

xgb.plot.tree layout in r

无人久伴 提交于 2019-12-22 10:44:39
问题 I was reading a xgb notebook and the xgb.plot.tree command in example result in a pic like this: However when i do the same thing I got a pic like this which are two separate graphs and in different colors too. Is that normal? are the two graphs two trees? 回答1: I have the same issue. According to an issue case on the xgboost github repository, this could be due to a change in the DiagrammeR library used by xgboost for rendering trees. https://github.com/dmlc/xgboost/issues/2640 Instead of

How is the gradient and hessian of logarithmic loss computed in the custom objective function example script in xgboost's github repository?

[亡魂溺海] 提交于 2019-12-22 05:48:18
问题 I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script. I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the script. Here is a simplified example: import numpy as np def loglikelihoodloss(y_hat, y_true): prob = 1.0 / (1.0 + np.exp(-y_hat)) grad = prob - y_true hess = prob * (1.0 - prob) return grad, hess y_hat = np.array([1.80087972, -1.82414818, -1

Parallel processing with xgboost and caret

青春壹個敷衍的年華 提交于 2019-12-20 19:34:39
问题 I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to: Register the number of cores (for