Remove rows where all variables are NA using dplyr

后端 未结 6 1697
北荒
北荒 2020-12-08 07:18

I\'m having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Re

6条回答
  •  醉酒成梦
    2020-12-08 08:05

    Starting with dyplr 1.0, the colwise vignette gives a similar case as an example:

    filter(across(everything(), ~ !is.na(.x))) #Remove rows with *any* NA
    

    We can see it uses the same implicit "& logic" filter uses with multiple expressions. So the following minor adjustment selects all NA rows:

    filter(across(everything(), ~ is.na(.x))) #Remove rows with *any* non-NA
    

    But the question asks for the inverse set: Remove rows with all NA.

    1. We can do a simple setdiff using the previous, or
    2. we can use the fact that across returns a logical tibble and filter effectively does a row-wise all() (i.e. &).

    Eg:

    rowAny = function(x) apply(x, 1, any)
    anyVar = function(fcn) rowAny(across(everything(), fcn)) #make it readable
    df %<>% filter(anyVar(~ !is.na(.x))) #Remove rows with *all* NA
    

    Or:

    filterout = function(df, ...) setdiff(df, filter(df, ...))
    df %<>% filterout(across(everything(), is.na)) #Remove rows with *all* NA
    

    Or even combinine the above 2 to express the first example more directly:

    df %<>% filterout(anyVar(~ is.na(.x))) #Remove rows with *any* NA
    

    In my opinion, the tidyverse filter function would benefit from a parameter describing the 'aggregation logic'. It could default to "all" and preserve behavior, or allow "any" so we wouldn't need to write anyVar-like helper functions.

提交回复
热议问题