gbm

doRedis/foreach GBM parallel processing error in R

柔情痞子 提交于 2021-01-28 15:04:29
问题 I am running a gbm model using the caret package and trying to get it working using parallel processing with the doredis package. I can get the backend workers all up and running, but am having issues when they recombine into the final model. I am getting this error: Error in foreach(j = 1:12, .combine = sum, .multicombine = TRUE) %dopar% : target of assignment expands to non-language object This is my first time trying to run the foreach loop (let alone on a complex problem like gbm) and am

How to reuse cross_validation_fold_assignment() with GBM in H2o library with Python

倾然丶 夕夏残阳落幕 提交于 2021-01-28 05:09:10
问题 I run my model with H2o library. I run with 5 folds cross-validation. model = H2OGradientBoostingEstimator( balance_classes=True, nfolds=5, keep_cross_validation_fold_assignment=True, seed=1234) model.train(x=predictors,y=response,training_frame=data) print('rmse: ',model.rmse(xval=True)) print('R2: ',model.r2(xval=True)) data_nfolds = model.cross_validation_fold_assignment() I got the cross-validation fold assignment. I try to reuse it for a new model with other parameters such as ntrees or

Using caret to optimize for deviance with binary classification

与世无争的帅哥 提交于 2020-05-12 05:17:56
问题 (example borrowed from Fatal error with train() in caret on Windows 7, R 3.0.2, caret 6.0-21) I have this example: library("AppliedPredictiveModeling") library("caret") data("AlzheimerDisease") data <- data.frame(predictors, diagnosis) tuneGrid <- expand.grid(interaction.depth = 1:2, n.trees = 100, shrinkage = 0.1) trainControl <- trainControl(method = "cv", number = 5, verboseIter = TRUE) gbmFit <- train(diagnosis ~ ., data = data, method = "gbm", trControl = trainControl, tuneGrid =

Using caret to optimize for deviance with binary classification

你。 提交于 2020-05-12 05:15:51
问题 (example borrowed from Fatal error with train() in caret on Windows 7, R 3.0.2, caret 6.0-21) I have this example: library("AppliedPredictiveModeling") library("caret") data("AlzheimerDisease") data <- data.frame(predictors, diagnosis) tuneGrid <- expand.grid(interaction.depth = 1:2, n.trees = 100, shrinkage = 0.1) trainControl <- trainControl(method = "cv", number = 5, verboseIter = TRUE) gbmFit <- train(diagnosis ~ ., data = data, method = "gbm", trControl = trainControl, tuneGrid =

How can I export a gbm model in R?

非 Y 不嫁゛ 提交于 2020-02-03 10:55:09
问题 Is there a standard (or available) way to export a gbm model in R? PMML would work, but when I I try to use the pmml library, perhaps incorrectly, I get an error: For example, my code looks similar to this: library("gbm") library("pmml") model <- gbm( formula, data = my.data, distribution = "adaboost", n.trees = 450, n.minobsinnode = 10, interaction.depth = 4, shrinkage=0.05, verbose=TRUE) export <- pmml(model) # and then export to xml And the error I get is: Error in UseMethod("pmml") : no

How can I export a gbm model in R?

て烟熏妆下的殇ゞ 提交于 2020-02-03 10:55:06
问题 Is there a standard (or available) way to export a gbm model in R? PMML would work, but when I I try to use the pmml library, perhaps incorrectly, I get an error: For example, my code looks similar to this: library("gbm") library("pmml") model <- gbm( formula, data = my.data, distribution = "adaboost", n.trees = 450, n.minobsinnode = 10, interaction.depth = 4, shrinkage=0.05, verbose=TRUE) export <- pmml(model) # and then export to xml And the error I get is: Error in UseMethod("pmml") : no

GBM R function: get variable importance separately for each class

白昼怎懂夜的黑 提交于 2020-01-20 16:48:06
问题 I am using the gbm function in R (gbm package) to fit stochastic gradient boosting models for multiclass classification. I am simply trying to obtain the importance of each predictor separately for each class, like in this picture from the Hastie book (the Elements of Statistical Learning) (p. 382). However, the function summary.gbm only returns the overall importance of the predictors (their importance averaged over all classes). Does anyone know how to get the relative importance values?

集成算法之Light GBM

烈酒焚心 提交于 2020-01-14 19:51:04
一、Light GBM Light GBM是和xgboost类似的一种集成算法。xgboost算法的一个瓶颈是针对每个特征,它都需要对每一个可能的分裂点扫描全部的样本来计算基尼系数,这样大大增加了计算量,降低了算法效率。为了解决这种在大样本高纬度数据的环境下耗时的问题,Light GBM算法使用直方图方法在牺牲一定精度的条件下,换取计算速度的提升和内存的消耗;主要使用如下两种方法:一是GOSS(Gradient-based One-Side Sampling, 基于梯度的单边采样),不是使用所用的样本点来计算梯度,而是对样本进行采样来计算梯度;二是EFB(Exclusive Feature Bundling, 互斥特征捆绑) ,这里不是使用所有的特征来进行扫描获得最佳的切分点,而是将某些特征进行捆绑在一起来降低特征的维度,是寻找最佳切分点的消耗减少。这样大大的降低的处理样本的时间复杂度,但在精度上,通过大量的实验证明,在某些数据集上使用Lightgbm并不损失精度,甚至有时还会提升精度。 二、直方图算法(Histogram-based Alogrithm) 直方图优化算法需要在训练前预先把特征值转化为bin,也就是对每个特征的取值做个分段函数,将所有样本在该特征上的取值划分到某一段(bin)中,最终把特征取值从连续值转化成了离散值。 第一个 for

Error in R gbm function when cv.folds > 0

随声附和 提交于 2020-01-10 15:41:53
问题 I am using gbm to predict binary response. When I set cv.folds=0, everything works well. However when cv.folds > 1, I got error: Error in object$var.levels[[i]] : subscript out of bounds when the first irritation of crossvalidation finished. Someone said this could because some factor variables have missing levels in training or testing data, but I tried only use numeric variables and still get this error. > gbm.fit <- gbm(model.formula, + data=dataall_train, + distribution = "adaboost", + n

Error in R gbm function when cv.folds > 0

梦想的初衷 提交于 2020-01-10 15:41:11
问题 I am using gbm to predict binary response. When I set cv.folds=0, everything works well. However when cv.folds > 1, I got error: Error in object$var.levels[[i]] : subscript out of bounds when the first irritation of crossvalidation finished. Someone said this could because some factor variables have missing levels in training or testing data, but I tried only use numeric variables and still get this error. > gbm.fit <- gbm(model.formula, + data=dataall_train, + distribution = "adaboost", + n