xgboost

xgboost on Sagemaker notebook import fails

◇◆丶佛笑我妖孽 提交于 2021-02-07 11:13:24
问题 I am trying to use XGBoost on Sagemaker notebook. I am using conda_python3 kernel, and the following packages are installed: py-xgboost-mutex libxgboost py-xgboost py-xgboost-gpu But once I am trying to import xgboost it fails on import: ModuleNotFoundError Traceback (most recent call last) <ipython-input-5-5943d1bfe3f1> in <module>() ----> 1 import xgboost as xgb ModuleNotFoundError: No module named 'xgboost' 回答1: In Sagemaker notebooks use the below steps a) If in Notebook i) !type python3

xgboost on Sagemaker notebook import fails

有些话、适合烂在心里 提交于 2021-02-07 11:12:48
问题 I am trying to use XGBoost on Sagemaker notebook. I am using conda_python3 kernel, and the following packages are installed: py-xgboost-mutex libxgboost py-xgboost py-xgboost-gpu But once I am trying to import xgboost it fails on import: ModuleNotFoundError Traceback (most recent call last) <ipython-input-5-5943d1bfe3f1> in <module>() ----> 1 import xgboost as xgb ModuleNotFoundError: No module named 'xgboost' 回答1: In Sagemaker notebooks use the below steps a) If in Notebook i) !type python3

Reproduce LightGBM Custom Loss Function for Regression

这一生的挚爱 提交于 2021-02-07 09:40:39
问题 I want to reproduce the custom loss function for LightGBM. This is what I tried: lgb.train(params=params, train_set=dtrain, num_boost_round=num_round, fobj=default_mse_obj) With default_mse_obj being defined as: residual = y_true - y_pred.get_label() grad = -2.0*residual hess = 2.0+(residual*0) return grad, hess However, eval metrics are different for the default "regression" objective, compared to the custom loss function defined. I would like to know, what is the default function used by

Cannot save model using PySpark xgboost4j

女生的网名这么多〃 提交于 2021-02-07 08:12:20
问题 I have a small PySpark program that uses xgboost4j and xgboost4j-spark in order to train a given dataset in a spark dataframe form. The training is done, but It seems I cannot save the model. Current libraries versions: Pyspark 2.4.0 xgboost4j 0.90 xgboost4j-spark 0.90 Spark submit args: os.environ['PYSPARK_SUBMIT_ARGS'] = "--py-files dist/DNA-0.0.2-py3.6.egg " \ "--jars dna/resources/xgboost4j-spark-0.90.jar," \ "dna/resources/xgboost4j-0.90.jar pyspark-shell" The training process is as

Cannot save model using PySpark xgboost4j

六眼飞鱼酱① 提交于 2021-02-07 08:09:33
问题 I have a small PySpark program that uses xgboost4j and xgboost4j-spark in order to train a given dataset in a spark dataframe form. The training is done, but It seems I cannot save the model. Current libraries versions: Pyspark 2.4.0 xgboost4j 0.90 xgboost4j-spark 0.90 Spark submit args: os.environ['PYSPARK_SUBMIT_ARGS'] = "--py-files dist/DNA-0.0.2-py3.6.egg " \ "--jars dna/resources/xgboost4j-spark-0.90.jar," \ "dna/resources/xgboost4j-0.90.jar pyspark-shell" The training process is as

Names features importance plot after preprocessing

房东的猫 提交于 2021-02-06 15:48:37
问题 Before building a model I make scaling like this X = StandardScaler(with_mean = 0, with_std = 1).fit_transform(X) and after build a features importance plot xgb.plot_importance(bst, color='red') plt.title('importance', fontsize = 20) plt.yticks(fontsize = 10) plt.ylabel('features', fontsize = 20) The problem is that instead of feature's names we get f0, f1, f2, f3 etc..... How to return feature's names? thanks 回答1: first we get list of feature names before preprocessing dtrain = xgb.DMatrix(

双节棍「大师」鱼佬亲传武功秘籍:如何进行一场数据挖掘算法竞赛?

自作多情 提交于 2021-02-06 15:11:16
当我们掌握了一定的机器学习和数据挖掘基础理论后,参加一场数据算法竞赛可以接触真实的业务和数据,将理论知识过渡到工程应用,同时可以在竞赛过程中进行反复地思考,强化对理论知识的理解。 本次分享,我将以个人竞赛经历和圈内整体情况为背景和大家聊聊如何进行一场数据挖掘算法竞赛,以及赛前、赛中和赛后需要做哪些事情。最后还将进行一个案例分享,来看看我是如何进行一场比赛的。 注: 本文详细视频 晚7点 在阿里天池分享,链接可回看 https://tianchi.aliyun.com/course/live?liveId=41153 主题大纲 为什么要参加数据挖掘竞赛?能带来什么? 参加竞赛需要哪些基础知识和技能? 如何选择适合自己的竞赛? 竞赛中的几个主要模块议 竞赛过程中最重要的事情 好的竞赛总结比竞赛过程更重要 案例分享( 天池“全国城市计算AI挑战赛”) 为什么要参加数据挖掘竞赛? 从理论知识到从理论知识到工程应用;真实数据,增加项目经验 求职加分,企业看重;企业办赛,人才选拔 奖金的激励(丰厚) 交友,学习,PK高手 参加竞赛需要的基础知识和技能? 理论知识掌握:评价指标、数据分析、特征工程、常用模型 工具的掌握 语言的选择:Python 可视化工具:Matplotlib、Seaborn 数据处理工具:Pandas、NumPy 机器学习库:Sklearn、XGBoost、LightGBM

50经典面试题 | 附参考答案

亡梦爱人 提交于 2021-01-31 01:46:03
点击上方 “ AI算法与图像处理 ”,选择加"星标"或“置顶” 重磅干货,第一时间送达 来源:计算机视觉研究院专栏 作者:Edison_G 有兴趣的同学请学会面试答题!祝大家都可以拿到心仪的Offer! 1、请详细说说支持向量机(support vector machine,SVM)的原理 支持向量机,因其英文名为support vector machine,故一般简称SVM,通俗来讲,它是一种二类分类模型,其基本模型定义为特征空间上的间隔最大的线性分类器,其学习策略便是间隔最大化,最终可转化为一个凸二次规划问题的求解。 2、哪些机器学习算法不需要做归一化处理? 在实际应用中,需要归一化的模型: 1.基于距离计算的模型:KNN。 2.通过梯度下降法求解的模型:线性回归、逻辑回归、支持向量机、神经网络。 但树形模型不需要归一化,因为它们不关心变量的值,而是关心变量的分布和变量之间的条件概率,如决策树、随机森林(Random Forest)。 3、树形结构为什么不需要归一化? 因为数值缩放不影响分裂点位置,对树模型的结构不造成影响。 按照特征值进行排序的,排序的顺序不变,那么所属的分支以及分裂点就不会有不同。而且,树模型是不能进行梯度下降的,因为构建树模型(回归树)寻找最优点时是通过寻找最优分裂点完成的,因此树模型是阶跃的,阶跃点是不可导的,并且求导没意义,也就不需要归一化。 4、在k

XGBoost error 'DMatrix' object does not support indexing

僤鯓⒐⒋嵵緔 提交于 2021-01-29 10:24:02
问题 I am trying to use XGBoost library using .train function and DMatrix but I am a little stuck because of an error : Traceback (most recent call last): File "", line 1, in runfile('E:/CrossValidation.py', wdir='E:/') File "C:\Users\users\Anaconda3\envs\Lightgbm\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace) File "C:\Users\users\Anaconda3\envs\Lightgbm\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec

ModuleNotFoundError: No module named 'xgboost.sklearn'

こ雲淡風輕ζ 提交于 2021-01-28 11:54:12
问题 I'm trying to import xgboost into jupyter-notebook but get the following error: --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) <ipython-input-9-a585b270d0df> in <module> 1 import pandas as pd 2 import numpy as np ----> 3 import xgboost ~/.local/lib/python3.6/site-packages/xgboost/__init__.py in <module> 14 from . import tracker # noqa 15 from .tracker import RabitTracker # noqa ---> 16 from . import dask 17 try