Remove rows where all variables are NA using dplyr

后端未结

关注

 6  1697

北荒 2020-12-08 07:18

I\'m having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Re

6条回答

醉酒成梦 (楼主)

2020-12-08 08:05
Starting with dyplr 1.0, the colwise vignette gives a similar case as an example:
```
filter(across(everything(), ~ !is.na(.x))) #Remove rows with *any* NA
```
We can see it uses the same implicit "& logic" filter uses with multiple expressions. So the following minor adjustment selects all NA rows:
```
filter(across(everything(), ~ is.na(.x))) #Remove rows with *any* non-NA
```
But the question asks for the inverse set: Remove rows with all NA.
1. We can do a simple setdiff using the previous, or
2. we can use the fact that across returns a logical tibble and filter effectively does a row-wise all() (i.e. &).
Eg:
```
rowAny = function(x) apply(x, 1, any)
anyVar = function(fcn) rowAny(across(everything(), fcn)) #make it readable
df %<>% filter(anyVar(~ !is.na(.x))) #Remove rows with *all* NA
```
Or:
```
filterout = function(df, ...) setdiff(df, filter(df, ...))
df %<>% filterout(across(everything(), is.na)) #Remove rows with *all* NA
```
Or even combinine the above 2 to express the first example more directly:
```
df %<>% filterout(anyVar(~ is.na(.x))) #Remove rows with *any* NA
```
In my opinion, the tidyverse filter function would benefit from a parameter describing the 'aggregation logic'. It could default to "all" and preserve behavior, or allow "any" so we wouldn't need to write anyVar-like helper functions.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...