group data and filter groups by two columns (dplyr)

问题

I have a question regarding using dplyr to filter a dataset.

I want to group data by its RestaurantID and then filter() all groups where the wage >= 5 in Year==1992.

For example:

I have:

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     2             92          3
     2             93          4
     3             92          5
     3             93          5

Dataset I want (where it keeps all groups that had a wage value in 1992 that was >= 5)

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     3             92          5
     3             93          5

I tried:

data %>% group_by("RestaurantID") %>% filter(any(Wage>= '5', Year =='92'))

But this gives me all rows where wage is >=5.

回答1:

We could do this without grouping using filter

library(dplyr)
df1 %>% 
    filter(RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#3            3   92    5
#4            3   93    5

or the same logic with base R

subset(df1, RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#5            3   92    5
#6            3   93    5

回答2:

It's ok to have a single TRUE value per ID if you want all rows of that group returned. In that case, the TRUE value is recycled to the length of that group and hence all rows are returned.

df %>% group_by(RestaurantID) %>% filter(Wage[Year == 92] >= 5)
## A tibble: 4 x 3
## Groups:   RestaurantID [2]
#  RestaurantID  Year  Wage
#         <int> <int> <int>
#1            1    92     6
#2            1    93     4
#3            3    92     5
#4            3    93     5

Please note that when comparing numbers, you shouldn't put them in quote them like '5' because that turns the numbers into characters.

Alternatively, you could modify your original approach to:

df %>% group_by(RestaurantID) %>% filter(any(Wage >= 5 & Year == 92))

which also returns the correct subset.

来源：https://stackoverflow.com/questions/47890522/group-data-and-filter-groups-by-two-columns-dplyr

标签

filter

group-by

dplyr