R: Remove multiple empty columns of character variables

后端 未结 9 1222
轮回少年
轮回少年 2020-12-08 11:06

I have a data frame where all the variables are of character type. Many of the columns are completely empty, i.e. only the variable headers are there, but no values. Is ther

相关标签:
9条回答
  • 2020-12-08 11:28

    A simple solution using the purrr package:

    purrr::discard(my_data_frame, ~all(is.na(.)))

    0 讨论(0)
  • 2020-12-08 11:31

    If you know the column indices, you can use

    df[,-c(3, 5, 7)]
    

    This will omit columns 3, 5, 7.

    0 讨论(0)
  • 2020-12-08 11:35

    I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.

    The selected answer: https://stackoverflow.com/a/17672737/233467 didn't work for me, but this did:

    df[!sapply(df, function (x) all(is.na(x) | x == ""))]
    
    0 讨论(0)
  • 2020-12-08 11:35

    It depends what you mean by empty: Is it NA or "", or can it even be " "? Something like this might work:

    df[,!apply(df, 2, function(x) all(gsub(" ", "", x)=="", na.rm=TRUE))]
    
    0 讨论(0)
  • 2020-12-08 11:39

    You can do either of the following:

    emptycols <- sapply(df, function (k) all(is.na(k)))
    df <- df[!emptycols]
    

    or:

    emptycols <- colSums(is.na(df)) == nrow(df)
    df <- df[!emptycols]
    

    If by empty you mean they are "", the second approach can be adapted like so:

    emptycols <- colSums(df == "") == nrow(df)
    
    0 讨论(0)
  • 2020-12-08 11:40

    Here is something that can be modified to exclude columns containing any variables specied.

    newdf= df[, apply(df, 2, function(x) !any({is.na(x) | x== "" | 
    x== "-4"} ) )] 
    
    0 讨论(0)
提交回复
热议问题