xgboost | 易学教程

最全BAT算法面试100题：阿里、百度、腾讯、京东、美团、今日头条

阅读更多关于最全BAT算法面试100题：阿里、百度、腾讯、京东、美团、今日头条

第一：复杂度估算和排序算法（上） 1) 时间复杂度和空间复杂度 2）认识对数器 3）冒泡排序 4）选择排序 5）插入排序 6）如何分析递归过程的时间复杂度 7）归并排序 8）小和问题第二：复杂度估算和排序算法（下） 1）荷兰国旗问题 2）随机快速排序 3）堆结构与堆排序 4）认识排序算法的稳定性 5）认识比较器 6）桶排序 7）计数排序 8）基数排序 9）数组排序后的最大差值问题 10）排序算法在工程中的应用第三：章栈、队列、链表、数组和矩阵结构 1）栈结构 2）队列结构 3）链表结构 4）数组结构 5）矩阵结构 6）二分搜索的扩展第四：二叉树结构 1）二叉树结构 2）二叉树的递归与非递归遍历 3）打印二叉树 4）判断搜索二叉树 5）判断完全二叉树 6）判断平衡二叉树 7）折纸问题 8）二叉树节点的前驱节点与后继节点 9）二叉树的序列化和反序列化第五：和哈希函数有关的三个结构与并查集 1）哈希函数与哈希表 2）布隆过滤器详解 3）一致性哈希结构 4）并查集结构与应用（岛问题）第六：章图算法 1）图结构的表示方法 2）图的深度优先遍历与宽度优先遍历 3）拓扑排序问题 4）最小生成树问题 5）单源最短路径问题第七：前缀树、堆结构和贪心算法 1）前缀树 2）堆结构的扩展与应用 3）介绍贪心算法及其相关题目 4）在面试中如何快速的尝试出贪心策略第八：暴力递归到动态规划 1

xgboost load model in c++ (python -> c++ prediction scores mismatch)

阅读更多关于 xgboost load model in c++ (python -> c++ prediction scores mismatch)

问题 I'm reaching out to all SO c++ geniuses. I've trained (and successfully tested) an xgboost model in python like so: dtrain =xgb.DMatrix(np.asmatrix(X_train),label=np.asarray(y_train,dtype=np.int), feature_names=feat_names) optimal_model = xgb.train(plst, dtrain) dtest = xgb.DMatrix(np.asmatrix(X_test),feature_names=feat_names) optimal_model.save_model('sigdet.model') I've followed a post on the XgBoost (see link) which explains the correct way to load and apply prediction in c++: // Load

Create RMSLE metric in caret in r

阅读更多关于 Create RMSLE metric in caret in r

Could someone please help me with the following: I need to change my xgboost training model with caret package to an undefault metric RMSLE. By default caret and xgboost train and measure in RMSE. Here are the lines of code: create custom summary function in caret format custom_summary = function(data, lev = NULL, model = NULL){ out = rmsle(data[, "obs"], data[, "pred"]) names(out) = c("rmsle") out } create control object control = trainControl(method = "cv", number = 2, summaryFunction = custom_summary) create grid of tuning parameters grid = expand.grid(nrounds = 100, max_depth = 6, eta = 0

How to get access of individual trees of a xgboost model in python /R

阅读更多关于 How to get access of individual trees of a xgboost model in python /R

问题 How to get access of individual trees of a xgboost model in python/R ? Below I'm getting from Random Forest trees from sklearn . estimator = RandomForestRegressor(oob_score=True, n_estimators=10,max_features='auto') estimator.fit(tarning_data,traning_target) tree1 = estimator.estimators_[0] leftChild = tree1.tree_.children_left rightChild = tree1.tree_.children_right 回答1: Do you want to inspect the trees? In Python, you can dump the trees as a list of strings: m = xgb.XGBClassifier(max_depth

muti output regression in xgboost

阅读更多关于 muti output regression in xgboost

问题 Is it possible to train a model in Xgboost that have multiple continuous outputs (multi regression)? What would be the objective to train such a model? Thanks in advance for any suggestions 回答1: My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor . MultiOutputRegressor trains one regressor per target and only requires that the regressor implements fit and predict , which xgboost happens to support. # get some noised linear data X = np.random

集成学习

阅读更多关于集成学习

集成学习集成学习通过构建并结合多个学习器来完成学习任务。集成学习的思路是通过合并多个模型来提升机器学习性能，这种方法相较于当个单个模型通常能够获得更好的预测结果。这也是集成学习在众多高水平的比赛如奈飞比赛，KDD和Kaggle，被首先推荐使用的原因。分类用于减少方差的bagging 用于减少偏差的boosting 用于提升预测结果的stacking 集成学习方法也可以归为如下两大类： 1 串行集成方法，这种方法串行地生成基础模型（如AdaBoost）。串行集成的基本动机是利用基础模型之间的依赖。通过给错分样本一个较大的权重来提升性能。 2 并行集成方法，这种方法并行地生成基础模型（如Random Forest）。并行集成的基本动机是利用基础模型的独立性，因为通过平均能够较大地降低误差。 Bagging和Boosting的区别？样本选择： Bagging：训练集是在原始集中有放回选取的，从原始集中选出的各轮训练集之间是独立的。 Boosting：每一轮的训练集不变，只是训练集中每个样例在分类器中的权重发生变化。而权值是根据上一轮的分类结果进行调整。样例权重： Bagging：使用均匀取样，每个样例的权重相等。 Boosting：根据错误率不断调整样例的权值，错误率越大则权重越大。预测函数： Bagging：所有预测函数的权重相等。 Boosting

How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?

阅读更多关于 How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?

How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification? So for example, if xgboost predicts a probability of an event is 0.9, how can the confidence in that probability be obtained? Also is this confidence assumed to be heteroskedastic? To produce confidence intervals for xgboost model you should train several models (you can use bagging for this). Each model will produce a response for test sample - all responses will form a distribution from which you can easily compute confidence intervals using basic statistics. You should produce

python xgboost continue training on existing model

阅读更多关于 python xgboost continue training on existing model

Lets say I build an xgboost model: bst = xgb.train(param0, dtrain1, num_round, evals=[(dtrain, "training")]) Where: param0 is a set of params to xgb, dtrain1 is a DMatrix ready to be trained num_round is the number of rounds Then, I save the model to disk: bst.save_model("xgbmodel") Later on, I want to reload the model I saved and continue training it with dtrain2 Does anyone have an idea how to do it? You don't even have to load the model from the disk and retrain. All you need to do is the same xgb.train command with additional parameter: xgb_model= (either xgboost model full path name you

Parallel threading with xgboost?

阅读更多关于 Parallel threading with xgboost?

According to its documentation, xgboost has an n_jobs parameter. However, when I attempt to set n_jobs, I get this error: TypeError: __init__() got an unexpected keyword argument 'n_jobs' Same issue for some other parameters like random_state. I assumed this might be an update issue, but it seems I have the latest version (0.6a2, installed with pip). There isn't much needed for me to reproduce the error: from xgboost import XGBClassifier estimator_xGBM = XGBClassifier(max_depth = 5, learning_rate = 0.05, n_estimators = 400, n_jobs = -1).fit(x_train) Any ideas? stgrmks I installed xgboost

XGBoost论文阅读

阅读更多关于 XGBoost论文阅读

论文创新点：提出了一种能处理稀疏数据的提升生树算法描述了一种加权分位数方法的大概流程，能够用于处理近似树学习中的实例权重。并行和分布式设计让这个算法有非常快的训练速度。 XGBoost能够在外存上进行计算，使其能处理更大的数据量。目标函数损失函数上面的这个损失函数在欧几里得空间中用传统的优化方法是没有办法求解，为了解决这个问题，本文中采用了贪婪算法，把上面的加和函数分成一步步迭代的来求解，即第 t 步只优化第 t 个分类器，固定前 t-1 步所有分类器：对上式进行二阶泰勒展开可以更快速的求解：来源： https://www.cnblogs.com/xumaomao/p/11852063.html

订阅 xgboost