Conditional filter of grouped factors - dplyr

不想你离开。 提交于 2019-12-12 09:43:40

问题


Say I have this sort of dataframe:

   day value group type id
1    1   0.1     A    X  1
2    1   0.4     A    Y  1
3    2   0.2     A    X  3
4    2   0.5     A    Y  3
5    3   0.3     A    X  5
6    3   0.2     A    Y  6
7    1   0.1     B    X  3
8    1   0.3     B    Y  3
9    2   0.1     B    X 11
10   2   0.4     B    Y 10
11   3   0.2     B    X 12
12   3   0.3     B    Y 12
13   1   0.1     C    X 12
14   1   0.3     C    Y 12
15   2   0.3     C    X  5
16   2   0.2     C    Y  5
17   3   0.2     C    X  3
18   3   0.2     C    Y  2

Data:

library(dplyr)
df1 <- data.frame(
day = rep(1:3,6),
value = c(0.1,0.2,0.3,0.4,0.5,0.2,0.1,0.1,0.2,0.3,0.4,0.3, 0.1,0.3,0.2,0.3,0.2,0.2),
group = rep(LETTERS[1:3], each=6)
) %>% 
  arrange(group,day) %>% 
  mutate(type=rep(LETTERS[24:25],9),
         id = c(1,1,3,3,5,6,3,3,11,10,12,12,12,12,5,5,3,2))  

df1

I want to filter this dataframe based upon a conditional filter. I want to group_by(day, group) and if all id within each grouping are all equal, I want to filter out all rows of type Y but keep rows of type X.

I can do this by running a loop or through several steps of dataframe subsetting, but I was wondering if there is a one/two liner in dplyr or data.table that I am somehow overlooking.

This would be the desired output:

   day value group type id
1    1   0.1     A    X  1
3    2   0.2     A    X  3
5    3   0.3     A    X  5
6    3   0.2     A    Y  6
7    1   0.1     B    X  3
9    2   0.1     B    X 11
10   2   0.4     B    Y 10
11   3   0.2     B    X 12
13   1   0.1     C    X 12
15   2   0.3     C    X  5
17   3   0.2     C    X  3
18   3   0.2     C    Y  2

回答1:


Here is an one-liner with data.table.

We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'day', 'group', if the length of unique elements of 'id' is 1, we get the Subset of Data.table (.SD) rows where 'type' is 'X' or else get the .SD.

library(data.table)#v1.9.6+
setDT(df1)[, if(uniqueN(id)==1) .SD[type=='X'] else .SD, .(day, group)]
#    day group value type id
# 1:   1     A   0.1    X  1
# 2:   2     A   0.2    X  3
# 3:   3     A   0.3    X  5
# 4:   3     A   0.2    Y  6
# 5:   1     B   0.1    X  3
# 6:   2     B   0.1    X 11
# 7:   2     B   0.4    Y 10
# 8:   3     B   0.2    X 12
# 9:   1     C   0.1    X 12
#10:   2     C   0.3    X  5
#11:   3     C   0.2    X  3
#12:   3     C   0.2    Y  2

Or if the 'type' is already ordered as in the example data

unique(setDT(df1), by = c('day', 'group', 'id'))

If not ordered,

unique(setDT(df1)[order(group,day, id, type)],by = c('day', 'group' , 'id'))

data

df1 <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 
2L, 2L, 
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), value = c(0.1, 0.4, 0.2, 0.5, 
0.3, 0.2, 0.1, 0.3, 0.1, 0.4, 0.2, 0.3, 0.1, 0.3, 0.3, 0.2, 0.2, 
0.2), group = c("A", "A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "B", "C", "C", "C", "C", "C", "C"), type = c("X", "Y", 
"X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", 
"Y", "X", "Y"), id = c(1L, 1L, 3L, 3L, 5L, 6L, 3L, 3L, 11L, 10L, 
12L, 12L, 12L, 12L, 5L, 5L, 3L, 2L)), .Names = c("day", "value", 
"group", "type", "id"), class = "data.frame", 
row.names = c(NA, -18L))



回答2:


Similar to P Lapointe, I had the following. I initially wanted to use all() to check if all id values are identical or not, but that attempt failed. So, I chose to use diff(). Using mutate() I checked if all id values are identical or not for each group. Then, I chose rows which does not have the combination of check == TRUE and type == "Y". Finally, I removed the check column.

group_by(df1, day, group) %>%
mutate(check = any(diff(id) == 0)) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)

#     day value  group  type    id
#   (int) (dbl) (fctr) (chr) (dbl)
#1      1   0.1      A     X     1
#2      2   0.2      A     X     3
#3      3   0.3      A     X     5
#4      3   0.2      A     Y     6
#5      1   0.1      B     X     3
#6      2   0.1      B     X    11
#7      2   0.4      B     Y    10
#8      3   0.2      B     X    12
#9      1   0.1      C     X    12
#10     2   0.3      C     X     5
#11     3   0.2      C     X     3
#12     3   0.2      C     Y     2

EDIT

After communicating with akrun, I revised the code above. Here it is.

group_by(df1, day, group) %>%
mutate(check = n_distinct(id) == 1) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)



回答3:


Here's my solution with dplyr.

df1%>%
group_by(day,group)%>%
mutate(len=length(unique(id))==1)%>%
filter(ifelse(len==TRUE,type=="X",type %in% c("X","Y")))

Source: local data frame [12 x 6]
Groups: day, group

   day value group type id   len
1    1   0.1     A    X  1  TRUE
2    2   0.2     A    X  3  TRUE
3    3   0.3     A    X  5 FALSE
4    3   0.2     A    Y  6 FALSE
5    1   0.1     B    X  3  TRUE
6    2   0.1     B    X 11 FALSE
7    2   0.4     B    Y 10 FALSE
8    3   0.2     B    X 12  TRUE
9    1   0.1     C    X 12  TRUE
10   2   0.3     C    X  5  TRUE
11   3   0.2     C    X  3 FALSE
12   3   0.2     C    Y  2 FALSE



回答4:


Try this:

 df1 %>% group_by(day,group) %>% distinct(id)

Your original data is ordered, so it is ok, otherwise try

df1 %>% group_by(day, group) %>%
    arrange(group, day, type) %>% distinct(id) %>%
    ungroup %>%‌ ​arrange(group, day, type, id)


来源:https://stackoverflow.com/questions/33715660/conditional-filter-of-grouped-factors-dplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!