I am very new at machine learning and am attempting the forest cover prediction competition on Kaggle, but I am getting hung up pretty early on. I get the following error wh
I too am doing Kaggle competitions and have been using the 'caret' package to help with choosing the 'best' model parameters. After getting many of these errors I looked into the scripting behind the scenes and discovered a call to a function called 'class2ind' which does not exist (at least anywhere I know). I finally found another function called 'class.ind' which is in the 'nnet' package. I decided to just try and create a local function called 'class2ind' and pop in the code from the 'class.ind' function. And low and behold it worked!
# fix for caret
class2ind <- function(cl)
{
n <- length(cl)
cl <- as.factor(cl)
x <- matrix(0, n, length(levels(cl)) )
x[(1:n) + n*(unclass(cl)-1)] <- 1
dimnames(x) <- list(names(cl), levels(cl))
x
}
The following should work:
model1 <- train(as.factor(Cover_Type) ~ Elevation + Aspect + Slope + Horizontal_Distance_To_Hydrology,
data = data.train,
method = "rf", tuneGrid = data.frame(mtry = 3))
Its always better to specify the tuneGrid parameter which is a data frame with possible tuning values. Look at ?randomForest and ?train for more information. rf has only one tuning parameter mtry, which controls the number of features selected for each tree.
You can also run modelLookup to get a list of tuning parameters for each model
> modelLookup("rf")
# model parameter label forReg forClass probModel
#1 rf mtry #Randomly Selected Predictors TRUE TRUE TRUE