问题
I have a tidy tibble with a value column identified by 4 ID columns.
> MWA
# A tibble: 16 x 5
# Groups: Dir [2]
VP Con Dir Seg time_seg
<int> <int> <int> <int> <int>
1 10 2 1 1 1810
2 10 2 1 2 260
3 10 2 1 3 540
4 10 2 1 4 1470
5 10 2 1 5 460
6 10 2 1 6 690
7 10 2 1 7 760
8 10 2 1 8 NA
9 10 2 2 1 320
10 10 2 2 2 1110
11 10 2 2 3 450
12 10 2 2 4 600
13 10 2 2 5 1680
14 10 2 2 6 730
15 10 2 2 7 850
16 10 2 2 8 840
The dput to reproduce is
> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L,
690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L,
840L)), row.names = c(NA, -16L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))
They stem from a larger data set, where they have been grouped by VP, Con and finally Dir.
As you can see, in tibble row 10 there is a NA.
I now want to exclude the whole Dir group (so rows 1 trough 8), based on this condition that this one value is missing using dplyr.
Using the filter with is.na or complete.cases only removes the row with the NA, not the complete group (which is one "case" in this dataset).
回答1:
Using all() will evaluate the entire group, so you can skip the mutate step.
MWA %>%
group_by(Dir) %>%
filter(all(!is.na(time_seg)))
# A tibble: 8 x 5
# Groups: Dir [1]
VP Con Dir Seg time_seg
<int> <int> <int> <int> <int>
1 10 2 2 1 320
2 10 2 2 2 1110
3 10 2 2 3 450
4 10 2 2 4 600
5 10 2 2 5 1680
6 10 2 2 6 730
7 10 2 2 7 850
8 10 2 2 8 840
回答2:
You can first check whether there is any missing value in the specific column and then exclude the whole group.
library(dplyr)
MWA %>%
group_by(VP, Con, Dir) %>%
mutate(any_na = any(is.na(time_seg))) %>%
filter(!any_na)
# A tibble: 8 x 6
# Groups: VP, Con, Dir [1]
# VP Con Dir Seg time_seg any_na
# <int> <int> <int> <int> <int> <lgl>
# 1 10 2 2 1 320 FALSE
# 2 10 2 2 2 1110 FALSE
# 3 10 2 2 3 450 FALSE
# 4 10 2 2 4 600 FALSE
# 5 10 2 2 5 1680 FALSE
# 6 10 2 2 6 730 FALSE
# 7 10 2 2 7 850 FALSE
# 8 10 2 2 8 840 FALSE
回答3:
There is anyNA in base R
library(dplyr)
MWA %>%
group_by(Dir) %>%
filter(!anyNA(time_seg))
# A tibble: 8 x 5
# Groups: Dir [1]
# VP Con Dir Seg time_seg
# <int> <int> <int> <int> <int>
#1 10 2 2 1 320
#2 10 2 2 2 1110
#3 10 2 2 3 450
#4 10 2 2 4 600
#5 10 2 2 5 1680
#6 10 2 2 6 730
#7 10 2 2 7 850
#8 10 2 2 8 840
来源:https://stackoverflow.com/questions/52038100/exclude-groups-with-nas-in-tidy-dataset