I have a large data.frame that was generated by a process outside my control, which may or may not contain variables with zero variance (i.e. all the observations are the sa
I think having zero variance is equivalent to being constant and one can get around without doing any arithmetic operations at all. I would expect that range() outperforms var(), but I have not verified this:
removeConstantColumns <- function(a_dataframe, verbose=FALSE) {
notConstant <- function(x) {
if (is.factor(x)) x <- as.integer(x)
return (0 != diff(range(x, na.rm=TRUE)))
}
bkeep <- sapply(a_dataframe, notConstant)
if (verbose) {
cat('removeConstantColumns: '
, ifelse(all(bkeep)
, 'nothing'
, paste(names(a_dataframe)[!bkeep], collapse=',')
, ' removed', '\n')
}
return (a_dataframe[, bkeep])
}