xgboost

Exception during xgboost prediction: can not initialize DMatrix from DMatrix

半城伤御伤魂 提交于 2020-01-13 16:27:19
问题 I trained a xgboost model in Python using the Scikit-Learn Python API, and serialized it using pickle library. I uploaded the model to ML Engine, but when I try to do online predictions, i get the following exception: Prediction failed: Exception during xgboost prediction: can not initialize DMatrix from DMatrix An example of the json I'm using for prediction is the following: { "instances":[ [ 24.90625, 21.6435643564356, 20.3762376237624, 24.3679245283019, 30.2075471698113, 28.0947368421053,

Kaggle教程 机器学习中级6 XGBoost

房东的猫 提交于 2020-01-13 08:07:55
在本课程中,你将学习如何使用 梯度增强 方法来构建和优化模型。这个方法在Kaggle竞赛中占据优势地位,并且在不同的数据集中取到得很好的结果。 1、介绍 在本课程的大部分时间里,你已经使用随机森林方法进行了预测,该方法比单个决策树有更好的性能。 我们把随机森林方法称为“集成方法”。根据定义, 集成方法 结合了几个模型(例如,在随机森林的案例中有好几个树)的预测。 接下来,我们将学习另一种集成方法,称为 梯度增强 。 2、梯度增强 梯度增强是一种通过循环迭代将模型添加到集合中的方法。 它首先用一个模型初始化集合,这个模型的预测可能非常简单。(即使它的预测非常不准确,后续添加的集合将解决这些错误。) 然后,我们开始循环迭代: 首先,我们使用当前集成来为数据集中的每个观测结果生成预测。为了进行预测,我们将所有模型的预测添加到集成中。 这些预测被用来计算损失函数(例如, 平均平方误差 )。 然后,我们使用损失函数来适应一个新的模型,这个模型将被添加到集成中。具体地说,我们确定模型参数,以便将这个新模型添加到集成中来减少损失。(注:“梯度推进”中的“梯度”指的是我们将对损失函数使用梯度下降法来确定新模型中的参数。) 最后,我们将新的模型加入到集成中,并且重复… 3、案例 我们首先加载训练和验证数据 X_train 、 X_valid 、 y_train 和 y_valid 。 import

Understanding num_classes for xgboost in R

六月ゝ 毕业季﹏ 提交于 2020-01-11 08:08:23
问题 I'm having a lot of trouble figuring out how to correctly set the num_classes for xgboost. I've got an example using the Iris data df <- iris y <- df$Species num.class = length(levels(y)) levels(y) = 1:num.class head(y) df <- df[,1:4] y <- as.matrix(y) df <- as.matrix(df) param <- list("objective" = "multi:softprob", "num_class" = 3, "eval_metric" = "mlogloss", "nthread" = 8, "max_depth" = 16, "eta" = 0.3, "gamma" = 0, "subsample" = 1, "colsample_bytree" = 1, "min_child_weight" = 12) model <-

What is the difference between xgb.train and xgb.XGBRegressor (or xgb.XGBClassifier)?

十年热恋 提交于 2020-01-10 14:10:11
问题 I already know " xgboost.XGBRegressor is a Scikit-Learn Wrapper interface for XGBoost." But do they have any other difference? 回答1: xgboost.train is the low-level API to train the model via gradient boosting method. xgboost.XGBRegressor and xgboost.XGBClassifier are the wrappers ( Scikit-Learn-like wrappers , as they call it) that prepare the DMatrix and pass in the corresponding objective function and parameters. In the end, the fit call simply boils down to: self._Booster = train(params,

How to install xgboost in python on MacOS?

旧巷老猫 提交于 2020-01-10 07:07:53
问题 I am a newbie and learning python. Can someone help me- how to install xgboost in python. Im using Mac 10.11. I read online and did the below mentioned step, but not able to decode what to do next: pip install xgboost - 回答1: It's a little more complicated if you want to use multi-threading. For the record, I am using a Mac with OS X 10.10 (Yosemite). It took me a while to work through the various issues, but it is now running nicely in my Anaconda (Py36) environment. For multi-threading you

马蜂窝推荐排序算法模型是如何实现快速迭代的

南楼画角 提交于 2020-01-07 15:29:42
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> (马蜂窝技术原创文章,微信ID:mfwtech) Part.1马蜂窝推荐系统架构 马蜂窝推荐系统主要由召回(Match)、排序(Rank)、重排序(Rerank)几个部分组成,整体架构图如下: 在召回阶段,系统会从海量的内容库筛选出符合用户偏好的候选集(百级、千级);排序阶段在此基础上,基于特定的优化目标(如点击率)对候选集内容进行更加精准的计算和选择,为每一条内容进行精确打分,进而从候选集的成百上千条内容中选出用户最感兴趣的少量高质量内容。 本文我们将重点介绍马蜂窝推荐系统中的核心之一——排序算法平台,它的整体架构如何;为了给用户呈现更加精准的推荐结果,在支撑模型快速、高效迭代的过程中,排序算法平台发挥了哪些作用及经历的实践。 Part.2 排序算法平台的演进 2.1 整体架构 目前,马蜂窝排序算法线上模型排序平台主要由 通用数据处理模块、可替换模型生产模块、监控与分析模块 三部分组成,各模块结构及平台整体工作流程如下图所示: 2.1.1 模块 功能 (1) 通用数据处理模块 核心功能是特征建设以及训练样本的构建,也是整个排序算法最为基础和关键的部分。数据源涉及点击曝光日志、用户画像、内容画像等等,底层的数据处理依赖 Spark 离线批处理和 Flink 实时流处理。 (2) 可替换模型生产模块

XGBoost Spark One Model Per Worker Integration

走远了吗. 提交于 2020-01-05 04:08:11
问题 Trying to work through this notebook https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/3624187670661048/6320440561800420/latest.html. Using spark version 2.4.3 and xgboost 0.90 Keep getting this error ValueError: bad input shape () when trying to execute ... features = inputTrainingDF.select("features").collect() lables = inputTrainingDF.select("label").collect() X = np.asarray(map(lambda v: v[0].toArray(), features)) Y = np

【解决python,xgboost问题】XGBoostError: sklearn needs to be installed in order to use this module

情到浓时终转凉″ 提交于 2020-01-04 09:37:17
问题描述: 众所周知,要使用python的库xgboost必须要提前安装好sklearn。 但是,当我们xgboost和sklearn都安装了,本人在执行以下代码时: model_regr = xgboost . XGBRegressor ( booster = 'gbtree' , silent = 1 , nthread = - 1 , eta = 0.01 , min_child_weight = 1 , max_depth = 10 , gamma = 0 , subsample = 1 , colsample_bytree = 1 , colsample_bylevel = 1 , alpha = 1 , scale_pos_weight = 1 , objective = 'reg:linear' , eval_metric = 'mae' , missing = None , seed = 0 ) model_regr . fit ( x_train , y_train ) 却还出现了以下问题: XGBoostError: sklearn needs to be installed in order to use this module D : \Anaconda3\lib\site - packages\xgboost\sklearn . py in __init_

Tuning xgboost with xgb.train providing a validation set in R

南楼画角 提交于 2020-01-04 01:26:29
问题 Related questions here and here. The common way of tuning xgboost (i.e. nrounds) is using xgb.cv that performs k-fold cross validation, for example: require(xgboost) data(iris) set.seed(1) index = sample(1:150) X = as.matrix(iris[index, 1:4]) y = as.matrix(as.numeric(iris[index, "Species"])) - 1 param = list(eta=0.1, objective="multi:softprob") xgb.cv(params=param, data=X, nrounds=50, nfold=5, label=y, num_class=3) > train.merror.mean train.merror.std test.merror.mean test.merror.std > 1: 0

XGBoost Install Error

无人久伴 提交于 2020-01-03 19:55:39
问题 I am hitting an error trying to compile xgboost. I do not have sudo access which makes things tougher. I ran the following: git clone https://github.com/dmlc/xgboost.git --recursive cd xgboost make Which gives me the following error: g++ -std=c++0x -Wall -O3 -msse2 -Wno-unknown-pragmas -funroll-loops -Iinclude -Idmlc-core/include -Irabit/include -fPIC -fopenmp -MM -MT build/learner.o src/learner.cc >build/learner.d g++ -c -std=c++0x -Wall -O3 -msse2 -Wno-unknown-pragmas -funroll-loops