Quickly remove zero variance variables from a data.frame

后端 未结 8 771
独厮守ぢ
独厮守ぢ 2020-12-13 01:07

I have a large data.frame that was generated by a process outside my control, which may or may not contain variables with zero variance (i.e. all the observations are the sa

8条回答
  •  旧时难觅i
    2020-12-13 01:26

    Check this custom function. I did not try it on data frames with 100+ variables.

    remove_low_variance_cols <- function(df, threshold = 0) {
      n <- Sys.time() #See how long this takes to run
      remove_cols <- df %>%
        select_if(is.numeric) %>%
        map_dfr(var) %>%
        gather() %>% 
        filter(value <= threshold) %>%
        spread(key, value) %>%
        names()
    
      if(length(remove_cols)) {
        print("Removing the following columns: ")
        print(remove_cols)
      }else {
        print("There are no low variance columns with this threshold")
      }
      #How long did this script take?
      print(paste("Time Consumed: ", Sys.time() - n, "Secs."))
      return(df[, setdiff(names(df), remove_cols)])
    }
    

提交回复
热议问题