Remove rows which have all NAs in certain columns

后端 未结 5 1691
面向向阳花
面向向阳花 2020-12-11 08:08

Suppose you have a dataframe with 9 columns. You want to remove cases which have all NAs in columns 5:9. It\'s not at all relevant if there are NAs in columns 1:4.

S

相关标签:
5条回答
  • 2020-12-11 08:18

    This a one-liner to remove the rows with NA in all columns between 5 and 9. By combining rowSums() with is.na() it is easy to check whether all entries in these 5 columns are NA:

    x <- x[rowSums(is.na(x[,5:9]))!=5,]
    
    0 讨论(0)
  • 2020-12-11 08:29

    Here are two dplyr options:

    library(dplyr)
    df <- data_frame(a = c(0, NA, 0, 4, NA, 0, 6), b = c(1, NA, 0, 4, NA, 0, NA), c = c(1, 0, 1, NA, NA, 0, NA))
    
    
    # columns b and c would be the columns you don't want all NAs
    
    df %>% 
      filter_at(vars(b, c), any_vars(!is.na(.)))
    
    df %>% 
      filter_at(vars(b, c), any_vars(complete.cases(.)))
    
    # A tibble: 5 x 3
          a     b     c
      <dbl> <dbl> <dbl>
    1     0     1     1
    2    NA    NA     6
    3     0     6     1
    4     4     4    NA
    5     0     0     0
    
    0 讨论(0)
  • 2020-12-11 08:31

    I don't know that it's any faster than your function, but maybe you could use !any and is.na for each row of your data frame. With this example data:

    set.seed(1234)
    x = do.call(cbind, lapply(1:9, function(x) runif(10)))
    x[sample(length(x), size = 70)] <- NA
    x <- data.frame(x)
    
    > x
         X1 X2   X3   X4   X5   X6   X7   X8  X9
    1  0.11 NA   NA 0.46 0.55 0.07   NA   NA  NA
    2  0.62 NA   NA   NA   NA   NA 0.04   NA  NA
    3    NA NA   NA 0.30   NA   NA   NA 0.01  NA
    4  0.62 NA 0.04 0.51   NA   NA   NA   NA  NA
    5  0.86 NA   NA 0.18   NA   NA   NA   NA 0.2
    6  0.64 NA   NA   NA   NA 0.50   NA 0.52  NA
    7    NA NA   NA   NA 0.68   NA   NA   NA  NA
    8    NA NA   NA   NA   NA   NA   NA   NA  NA
    9    NA NA   NA   NA   NA 0.17   NA   NA  NA
    10   NA NA 0.05   NA   NA   NA   NA   NA  NA
    

    Looks like the 4th, 8th, and 10th rows should be dropped. So, you can use apply to iterate over each row to see if the condition is satisfied- any row where with any values other than NA in the 5th to 9th column will return TRUE, so you can use that as an indexer for your data frame.

    keep.rows <- apply(x[, 5:9], 1, FUN = function(row){
      any(!is.na(row))
    })
    
    > x[keep.rows, ]
        X1 X2 X3   X4   X5   X6   X7   X8  X9
    1 0.11 NA NA 0.46 0.55 0.07   NA   NA  NA
    2 0.62 NA NA   NA   NA   NA 0.04   NA  NA
    3   NA NA NA 0.30   NA   NA   NA 0.01  NA
    5 0.86 NA NA 0.18   NA   NA   NA   NA 0.2
    6 0.64 NA NA   NA   NA 0.50   NA 0.52  NA
    7   NA NA NA   NA 0.68   NA   NA   NA  NA
    9   NA NA NA   NA   NA 0.17   NA   NA  NA
    

    Again, not sure that it's faster than your function but... maybe?

    0 讨论(0)
  • 2020-12-11 08:31
    lines=
       'V1   V2   V3   V4  
        A    10   20   NA   
        B    NA   NA   NA   
        C     5   20   3     
        D    15   20   4    
        E    NA   10   5'
    
    df = read.table(textConnection(lines), header = T)
    
    df[is.na(df)] = 'X'
    
    
    attach(df)
    
    x = subset(df, V2 == 'X'   &   V3 == 'X'   &   V4 == 'X')
    df_new = df[-as.numeric(row.names(x)),]
    df_new
    
    #  V1 V2 V3 V4
    #1  A 10 20  X
    #3  C  5 20  3
    #4  D 15 20  4
    #5  E  X 10  5
    
    detach(df)
    
    0 讨论(0)
  • 2020-12-11 08:39

    You can use all with apply to find rows where all values are NA:

    x[!apply(is.na(x[,5:9]), 1, all),]
    

    or negate is.na and test for any:

    x[apply(!is.na(x[,5:9]), 1, any),]
    

    or using rowSums like @RHertel wher you dont need to calculate the number of selected rows:

    x[rowSums(!is.na(x[,5:9])) > 0,]
    
    0 讨论(0)
提交回复
热议问题