adabag boosting function throws error when giving mfinal>10

问题

I have a strange issue, whenever I try increasing the mfinal argument in boosting function of adabag package beyond 10 I get an error, Even with mfinal=9 I get warnings.

My train data has 7 class Dependant variable and 100 independant variables and around 22000 samples of data(Smoted one class using DMwR). My Dependant Variable is at the end of the training dataset in sequence.

library(adabag)
gc()
exp_recog_boo <- boosting(V1 ~ .,data=train_dataS,boos=TRUE,mfinal=9)

Error in 1:nrow(object$splits) : argument of length 0
In addition: Warning messages:
1: In acum + acum1 :
longer object length is not a multiple of shorter object length

Thanks in advance.

回答1:

My mistake was that I didn't set the TARGET as factor before.

Try this:

train$target <- as.factor(train$target)

and check by doing:

str(train$TARGET)

回答2:

This worked for me:

modelADA <- boosting(lettr ~ ., data = trainAll, boos = TRUE, mfinal = 10, control = (minsplit = 0))

Essentially I just told rpart to require a minimum split length of zero to generate tree, it eliminated the error. I haven't tested this extensively so I can't guarantee it's a valid solution (what does a tree with a zero length leaf actually mean?), but it does prevent the error from being thrown.

回答3:

I think i Hit the problem.

ignore this -if you configure your control with a cp = 0, this wont happen. I think that if the first node of a tree make no improvement (or at least no better than the cp) the tree stay wiht 0 nodes so you have an empty tree and that make the algorithm fail.

EDIT: The problem is that the rpart generates trees with only one leaf(node) and the boosting metod use this sentence "k <- varImp(arboles[[m]], surrogates = FALSE, competes = FALSE)" being arboles[[m]] a tree with only one node it give you the eror.

To solve that you can modify the boosting metod:

Write: fix(boosting) and add the *'S lines.

if (boos == TRUE) { 
**   k <- 1
**   while (k == 1){
     boostrap <- sample(1:n, replace = TRUE, prob = pesos)
     fit <- rpart(formula, data = data[boostrap, -1],
         control = control)
**   k <- length(fit$frame$var)
**   }
     flearn <- predict(fit, newdata = data[, -1], type = "class")
     ind <- as.numeric(vardep != flearn)
     err <- sum(pesos * ind)
 }

this will prevent the algorith from acepting one leaf trees but you have to set the CP from the control param as 0 to avoid an endless loop..

回答4:

Just ran into the same problem, and setting the complexity parameter to -1 or minimum split to 0 both work for me with rpart.control, e.g.

library(adabag)

r1 <- boosting(Y ~ ., data = data, boos = TRUE, 
               mfinal = 10,  control = rpart.control(cp = -1))

r2 <- boosting(Y ~ ., data = data, boos = TRUE, 
               mfinal = 10,  control = rpart.control(minsplit = 0))

回答5:

I also run into this same problem recently and this example R script solves it completely!

The main idea is that you need to set the control for rpart (which adabag uses for creating trees, see rpart.control) appropriately, so that at least a split is attempted in every tree.

I'm not totally sure but it appears that your "argument of length 0" may be the result of an empty tree, which can happen since there is a default setting of a "complexity" parameter that tells the function not to attempt a split if the decrease in homogeneity/lack of fit is below certain threshold.

回答6:

use str() to see the attributes of your dataframe. For me, I just convert myclass variable as factor, then everything runs.

来源：https://stackoverflow.com/questions/16135708/adabag-boosting-function-throws-error-when-giving-mfinal10

标签

adaboost