Remove rows which have all NAs in certain columns

后端 未结 5 1693
面向向阳花
面向向阳花 2020-12-11 08:08

Suppose you have a dataframe with 9 columns. You want to remove cases which have all NAs in columns 5:9. It\'s not at all relevant if there are NAs in columns 1:4.

S

5条回答
  •  南方客
    南方客 (楼主)
    2020-12-11 08:31

    I don't know that it's any faster than your function, but maybe you could use !any and is.na for each row of your data frame. With this example data:

    set.seed(1234)
    x = do.call(cbind, lapply(1:9, function(x) runif(10)))
    x[sample(length(x), size = 70)] <- NA
    x <- data.frame(x)
    
    > x
         X1 X2   X3   X4   X5   X6   X7   X8  X9
    1  0.11 NA   NA 0.46 0.55 0.07   NA   NA  NA
    2  0.62 NA   NA   NA   NA   NA 0.04   NA  NA
    3    NA NA   NA 0.30   NA   NA   NA 0.01  NA
    4  0.62 NA 0.04 0.51   NA   NA   NA   NA  NA
    5  0.86 NA   NA 0.18   NA   NA   NA   NA 0.2
    6  0.64 NA   NA   NA   NA 0.50   NA 0.52  NA
    7    NA NA   NA   NA 0.68   NA   NA   NA  NA
    8    NA NA   NA   NA   NA   NA   NA   NA  NA
    9    NA NA   NA   NA   NA 0.17   NA   NA  NA
    10   NA NA 0.05   NA   NA   NA   NA   NA  NA
    

    Looks like the 4th, 8th, and 10th rows should be dropped. So, you can use apply to iterate over each row to see if the condition is satisfied- any row where with any values other than NA in the 5th to 9th column will return TRUE, so you can use that as an indexer for your data frame.

    keep.rows <- apply(x[, 5:9], 1, FUN = function(row){
      any(!is.na(row))
    })
    
    > x[keep.rows, ]
        X1 X2 X3   X4   X5   X6   X7   X8  X9
    1 0.11 NA NA 0.46 0.55 0.07   NA   NA  NA
    2 0.62 NA NA   NA   NA   NA 0.04   NA  NA
    3   NA NA NA 0.30   NA   NA   NA 0.01  NA
    5 0.86 NA NA 0.18   NA   NA   NA   NA 0.2
    6 0.64 NA NA   NA   NA 0.50   NA 0.52  NA
    7   NA NA NA   NA 0.68   NA   NA   NA  NA
    9   NA NA NA   NA   NA 0.17   NA   NA  NA
    

    Again, not sure that it's faster than your function but... maybe?

提交回复
热议问题