xgboost

XGBoost Error info.labels.size() != 0U (0 vs. 0)

北慕城南 提交于 2021-02-08 09:16:13
问题 I am trying to run a regression problem on python using XGBOOST: import xgboost global clf clf = XGBRegressor(n_estimators = 500, learning_rate = 0.05, max_depth=6, n_jobs=4, alpha = 0.1) clf.fit(X_train, y_train, early_stopping_rounds = 5, eval_set = validation, verbose=False) predicted_test_tr = np.round(clf.predict(X_test)) But it raises the following error, after a few iterations: XGBoostError: b'[10:56:23] src/objective/regression_obj.cc:43: Check failed: info.labels_.size() != 0U (0 vs.

memory error when todense in python using CountVectorizer

非 Y 不嫁゛ 提交于 2021-02-08 09:11:38
问题 Here is my code and memory error when call todense() , I am using GBDT model, and wondering if anyone have good ideas how to work around memory error? Thanks. for feature_colunm_name in feature_columns_to_use: X_train[feature_colunm_name] = CountVectorizer().fit_transform(X_train[feature_colunm_name]).todense() X_test[feature_colunm_name] = CountVectorizer().fit_transform(X_test[feature_colunm_name]).todense() y_train = y_train.astype('int') grd = GradientBoostingClassifier(n_estimators=n

memory error when todense in python using CountVectorizer

若如初见. 提交于 2021-02-08 09:11:14
问题 Here is my code and memory error when call todense() , I am using GBDT model, and wondering if anyone have good ideas how to work around memory error? Thanks. for feature_colunm_name in feature_columns_to_use: X_train[feature_colunm_name] = CountVectorizer().fit_transform(X_train[feature_colunm_name]).todense() X_test[feature_colunm_name] = CountVectorizer().fit_transform(X_test[feature_colunm_name]).todense() y_train = y_train.astype('int') grd = GradientBoostingClassifier(n_estimators=n

蚂蚁金服:超大规模分布式计算系统 + 超大规模分布式优化算法

五迷三道 提交于 2021-02-08 05:51:44
人工智能大数据与深度学习 公众号: weic2c 近年来,随着“大”数据及“大”模型的出现,学术界和工业界对分布式机器学习算法引起了广泛关注。针对这一刚需,阿里集团和蚂蚁金服设计了自己的分布式平台——鲲鹏。鲲鹏结合了分布式系统及并行优化算法,解决了大规模机器学习算法带来的一系列问题,不仅囊括了数据/模型并行、负载平衡、模型同步、稀疏表示、工业容错等特性,而且还提供了封闭好的、宜于调用的 API 供普通的机器学习者开发分布式算法,降低使用成本并提升效率。相关论文在本届 KDD 以口头报告的形式发表 (应用数据科学 Track)。 论文《鲲鹏:基于参数服务器的分布式学习系统及其在阿里巴巴和蚂蚁金服的应用》 (KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial),由蚂蚁金服人工智能部和阿里云团队的周俊,李小龙,赵沛霖,陈超超,李龙飞,杨新星,崔卿,余晋,陈绪,丁轶,漆远合作完成。 文中描述的实验在十亿级别的样本和特征数据上进行。结果表示,鲲鹏的设计使得一系列算法的性能都得到了极大的提升,包括 FTRL,Sparse-LR,以及 MART。此外,鲲鹏在阿里巴巴“双11”狂欢购物节及蚂蚁金服的交易风险检测中

XGBOOST feature name error - Python

巧了我就是萌 提交于 2021-02-08 04:06:56
问题 Probably this question has been asked many times in different forms. However, my problem is when I use XGBClassifier() with a production like data, I get a feature name mismatch error. I am hoping someone could please tell me what I am doing wrong. Here is my code. BTW, the data is completely made up: import pandas as pd from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.model_selection import train_test_split, KFold, cross_val_score from sklearn.metrics import

Can the R version of lime explain xgboost models with count:poisson objective function?

拟墨画扇 提交于 2021-02-07 19:14:20
问题 I generated a model using xgb.train with the "count:poisson" objective function and I get the following error when trying to create the explainer: Error: Unsupported model type Lime works when I replace the objective by something else such as reg:logistic. Is there a way to explain count:poisson in lime? thanks reproducible example: library(xgboost) library(dplyr) library(caret) library(insuranceData) # example dataset https://cran.r-project.org/web/packages/insuranceData/insuranceData.pdf

Can the R version of lime explain xgboost models with count:poisson objective function?

≯℡__Kan透↙ 提交于 2021-02-07 19:10:53
问题 I generated a model using xgb.train with the "count:poisson" objective function and I get the following error when trying to create the explainer: Error: Unsupported model type Lime works when I replace the objective by something else such as reg:logistic. Is there a way to explain count:poisson in lime? thanks reproducible example: library(xgboost) library(dplyr) library(caret) library(insuranceData) # example dataset https://cran.r-project.org/web/packages/insuranceData/insuranceData.pdf

xgboost.plot_tree: binary feature interpretation

蓝咒 提交于 2021-02-07 12:39:20
问题 I've built an XGBoost model and seek to examine the individual estimators. For reference, this was a binary classification task with discrete and continuous input features. The input feature matrix is a scipy.sparse.csr_matrix . When I went to examine an individual estimator, however, I found difficulty interpreting the binary input features, such as f60150 below. The real-valued f60150 in the bottommost chart is easy to interpret - its criterion is in the expected range of that feature.

xgboost on Sagemaker notebook import fails

人走茶凉 提交于 2021-02-07 11:15:03
问题 I am trying to use XGBoost on Sagemaker notebook. I am using conda_python3 kernel, and the following packages are installed: py-xgboost-mutex libxgboost py-xgboost py-xgboost-gpu But once I am trying to import xgboost it fails on import: ModuleNotFoundError Traceback (most recent call last) <ipython-input-5-5943d1bfe3f1> in <module>() ----> 1 import xgboost as xgb ModuleNotFoundError: No module named 'xgboost' 回答1: In Sagemaker notebooks use the below steps a) If in Notebook i) !type python3

xgboost on Sagemaker notebook import fails

折月煮酒 提交于 2021-02-07 11:13:54
问题 I am trying to use XGBoost on Sagemaker notebook. I am using conda_python3 kernel, and the following packages are installed: py-xgboost-mutex libxgboost py-xgboost py-xgboost-gpu But once I am trying to import xgboost it fails on import: ModuleNotFoundError Traceback (most recent call last) <ipython-input-5-5943d1bfe3f1> in <module>() ----> 1 import xgboost as xgb ModuleNotFoundError: No module named 'xgboost' 回答1: In Sagemaker notebooks use the below steps a) If in Notebook i) !type python3