xgboost | 易学教程

机器学习算法之LightGBM

阅读更多关于机器学习算法之LightGBM

这篇文章我们继续学习一下 GBDT 模型的另一个进化版本：LightGBM。LigthGBM是boosting集合模型中的新进成员，由微软提供，它和XGBoost一样是对GBDT的高效实现，原理上它和GBDT及XGBoost类似，都采用损失函数的负梯度作为当前决策树的残差近似值，去拟合新的决策树。 LightGBM在很多方面会比XGBoost表现的更为优秀。它有以下优势：更快的训练效率低内存使用更高的准确率支持并行化学习可处理大规模数据支持直接使用category特征从下图实验数据可以看出， LightGBM比XGBoost快将近10倍，内存占用率大约为XGBoost的1/6，并且准确率也有提升。看完这些惊人的实验结果以后，对下面两个问题产生了疑惑：XGBoost已经十分完美了，为什么还要追求速度更快、内存使用更小的模型？对GBDT算法进行改进和提升的技术细节是什么？提出LightGBM的动机常用的机器学习算法，例如神经网络等算法，都可以以mini-batch的方式训练，训练数据的大小不会受到内存限制。而GBDT在每一次迭代的时候，都需要遍历整个训练数据多次。如果把整个训练数据装进内存则会限制训练数据的大小；如果不装进内存，反复地读写训练数据又会消耗非常大的时间。尤其面对工业级海量的数据，普通的GBDT算法是不能满足其需求的。

服务性能优化过程

阅读更多关于服务性能优化过程

背景组内有好几个线上服务，除了业务逻辑不一样，请求处理过程基本上都是一致的。这些服务的执行逻辑都非常简单，但是有几个问题: 单机QPS很低，需要很多台机器 99分位耗时比理论上的长很多在服务的负载达到上限时，服务器的负载却非常低上面的这些问题在财大气粗的公司面前都不是问题啊，性能不够加机器！性能不够加机器！上游再一次过来反映超时问题时，为了壮年程序员的尊严，这次刚好有时间决定不再单纯的加机器，要把这个程序优化一下，用一次屠龙技。技术栈：语言：java 通信方式：thrift，server模式为THsHaServer redis客户端：jedis 模型：xgboost 执行环境：docker，8核 8G 缓存集群：通信协议为redis，但实现方式和redis不一样优化后结果优化过程想要提升程序的性能，首先需要找到程序的瓶颈所在，然后有针对的去进行优化。流程分析优化前流程图这个流程图省略了业务处理逻辑，只展示了各个线程之间的交互关系。构造redis请求的那部分逻辑，其实使用了两个forkjoin线程池，但它们两个的逻辑非常类似，为了简化把它们合并到一个流程中了。流程图中有两个深红色的步骤，很明显的看出两个明显能影响性能的地方。结合服务的执行环境，可以总结出以下问题：线程数过多由于操作系统对线程的抢占式调度，线程频繁的上下文切换会带来几个问题：

xgboost实战练习

阅读更多关于 xgboost实战练习

1.安装xgboost后导入 import xgboost 2. 训练并使用模型进行预测 # First XGBoost model for Pima Indians dataset from numpy import loadtxt from xgboost import XGBClassifier from sklearn . model_selection import train_test_split from sklearn . metrics import accuracy_score # load data dataset = loadtxt ( 'pima-indians-diabetes.csv' , delimiter = "," ) # split data into X and y X = dataset [ : , 0 : 8 ] Y = dataset [ : , 8 ] # split data into train and test sets seed = 7 test_size = 0.33 X_train , X_test , y_train , y_test = train_test_split ( X , Y , test_size = test_size , random_state = seed ) # fit model no

清华镜像源安装 NGboost XGboost Catboost

阅读更多关于清华镜像源安装 NGboost XGboost Catboost

清华镜像源安装 NGboost XGboost Catboost pip install catboost -i https://pypi.tuna.tsinghua.edu.cn/simple pip install ngboost -i https://pypi.tuna.tsinghua.edu.cn/simple pip install xgboost -i https://pypi.tuna.tsinghua.edu.cn/simple 数据比赛常用预测模型：LGB、XGB与ANN LGB lightgbm：由于现在的比赛数据越来越大，想要获得一个比较高的预测精度，同时又要减少内存占用以及提升训练速度，lightgbm是一个非常不错的选择，其可达到与xgboost相似的预测效果。 def LGB_predict ( train_x , train_y , test_x , res , index ) : print ( "LGB test" ) clf = lgb . LGBMClassifier ( boosting_type = 'gbdt' , num_leaves = 31 , reg_alpha = 0.0 , reg_lambda = 1 , max_depth = - 1 , n_estimators = 5000 , objective = 'binary

Converting XGBoost tree structure dump file to MQL4 (C like language) code

阅读更多关于 Converting XGBoost tree structure dump file to MQL4 (C like language) code

问题 I have a dump file of XGBoost tree structure trained in Python. The structure has 377 trees, and file has approximately 50,000 lines. I would like to convert this structure to MQL4 code, or C code so to say. The text file looks something like this: booster[0]: 0:[inp0<6.85417] yes=1,no=2,missing=1 1:[inp10<1.00054] yes=3,no=4,missing=3 3:[inp21<0.974632] yes=7,no=8,missing=7 7:[inp22<1.01021] yes=15,no=16,missing=15 15:[inp15<0.994931] yes=31,no=32,missing=31 31:[inp12<0.999151] yes=63,no=64

Access train and evaluation error in xgboost

阅读更多关于 Access train and evaluation error in xgboost

问题 I started using python xgboost backage. Is there a way to get training and validation errors at each training epoch? I can't find one in the documentation Have trained a simple model and got output: [09:17:37] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 124 extra nodes, 0 pruned nodes, max_depth=6 [0] eval-rmse:0.407474 train-rmse:0.346349 [09:17:37] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 116 extra nodes, 0 pruned nodes, max_depth=6 1 eval-rmse:0.410902 train

Access train and evaluation error in xgboost

阅读更多关于 Access train and evaluation error in xgboost

Access train and evaluation error in xgboost

阅读更多关于 Access train and evaluation error in xgboost

How to use XGboost in PySpark Pipeline

阅读更多关于 How to use XGboost in PySpark Pipeline

问题 I want to update my code of pyspark. In the pyspark, it must put the base model in a pipeline, the office demo of pipeline use the LogistictRegression as an base model. However, it seems not be able to use XGboost model in the pipeline api. How can I use the pyspark like this from xgboost import XGBClassifier ... model = XGBClassifier() model.fit(X_train, y_train) pipeline = Pipeline(stages=[..., model, ...]) ... It is convenient to use the pipeline api, so can anybody give some advices?

XGBoost多分类预测

阅读更多关于 XGBoost多分类预测

XGBoost多分类预测 1. 数据预处理对缺失值进行填充根据业务增加衍生变量，比如占比、分级化、TOP打横等等根据业务删除相应的指标对离散型的指标进行one-hot序列编码 2. 模型选择可以进行多分类预测的模型有逻辑回归、决策树、神经网络、随机森林、xgboost ，发现效果排名靠前的依次是 XGBoost、随机森林、决策树 3. 模型调用通过调用python相关包，对XGBoost分类模型进行参数调整，使模型效果更好。 # 导入的包 from xgboost . sklearn import XGBClassifier # 调用XGBClassifier方法，括号内都是默认的参数值，可对这些参数进行调整 XGBClassifier ( base_score = 0.5 , booster = 'gbtree' , colsample_bylevel = 1 , colsample_bynode = 1 , colsample_bytree = 1 , gamma = 0 , learning_rate = 0.1 , max_delta_step = 0 , max_depth = 8 , min_child_weight = 1 , missing = None , n_estimators = 100 , n_jobs = 1 , nthread =

订阅 xgboost