I am trying to use quantile regression forest function in R (quantregForest) which is built on Random Forest package. I am getting a type mismatch error that I can\'t quite
@mgoldwasser is right in general, but there is also a very nasty bug in predict.randomForest: Even if you have exactly the same levels in the training and in the prediction set, it is possible to get this error. This is possible when you have a factor where you have embedded NA as a separate level. The problem is that predict.randomForest essentially does the following:
# Assume your original factor has two "proper" levels + NA level:
f <- factor(c(0,1,NA), exclude=NULL)
length(levels(f)) # => 3
levels(f) # => "0" "1" NA
# Note that
sum(is.na(f)) # => 0
# i.e., the values of the factor are not `NA` only the corresponding level is.
# Internally predict.randomForest passes the factor (the one of the training set)
# through the function `factor(.)`.
# Unfortunately, it does _not_ do this for the prediction set.
# See what happens to f if we do that:
pf <- factor(f)
length(levels(pf)) # => 2
levels(pf) # => "0" "1"
# In other words:
length(levels(f)) != length(levels(factor(f)))
# => sad but TRUE
So, it will always discard the NA level from the training set and will always see one additional level in the prediction set.
A workaround is to replace the value NA of the level before using randomForest:
levels(f)[is.na(levels(f))] <- "NA"
levels(f) # => "0" "1" "NA"
# .... note that this is no longer a plain `NA`
Now calling factor(f) won't discard the level, and the check succeeds.