Quickly remove zero variance variables from a data.frame

后端 未结 8 759
独厮守ぢ
独厮守ぢ 2020-12-13 01:07

I have a large data.frame that was generated by a process outside my control, which may or may not contain variables with zero variance (i.e. all the observations are the sa

8条回答
  •  死守一世寂寞
    2020-12-13 01:13

    I think having zero variance is equivalent to being constant and one can get around without doing any arithmetic operations at all. I would expect that range() outperforms var(), but I have not verified this:

    removeConstantColumns <- function(a_dataframe, verbose=FALSE) {
      notConstant <- function(x) {
        if (is.factor(x)) x <- as.integer(x)
        return (0 != diff(range(x, na.rm=TRUE)))
      }
      bkeep <- sapply(a_dataframe, notConstant)
      if (verbose) {
        cat('removeConstantColumns: '
          , ifelse(all(bkeep)
            , 'nothing'
            , paste(names(a_dataframe)[!bkeep], collapse=',')
          , ' removed',  '\n')
      }
      return (a_dataframe[, bkeep])
    }
    

提交回复
热议问题