group data and filter groups by two columns (dplyr)

此生再无相见时 提交于 2020-12-30 03:41:46

问题


I have a question regarding using dplyr to filter a dataset.

I want to group data by its RestaurantID and then filter() all groups where the wage >= 5 in Year==1992.

For example:

I have:

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     2             92          3
     2             93          4
     3             92          5
     3             93          5

Dataset I want (where it keeps all groups that had a wage value in 1992 that was >= 5)

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     3             92          5
     3             93          5

I tried:

data %>% group_by("RestaurantID") %>% filter(any(Wage>= '5', Year =='92')) 

But this gives me all rows where wage is >=5.


回答1:


We could do this without grouping using filter

library(dplyr)
df1 %>% 
    filter(RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#3            3   92    5
#4            3   93    5

or the same logic with base R

subset(df1, RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#5            3   92    5
#6            3   93    5



回答2:


It's ok to have a single TRUE value per ID if you want all rows of that group returned. In that case, the TRUE value is recycled to the length of that group and hence all rows are returned.

df %>% group_by(RestaurantID) %>% filter(Wage[Year == 92] >= 5)
## A tibble: 4 x 3
## Groups:   RestaurantID [2]
#  RestaurantID  Year  Wage
#         <int> <int> <int>
#1            1    92     6
#2            1    93     4
#3            3    92     5
#4            3    93     5

Please note that when comparing numbers, you shouldn't put them in quote them like '5' because that turns the numbers into characters.

Alternatively, you could modify your original approach to:

df %>% group_by(RestaurantID) %>% filter(any(Wage >= 5 & Year == 92))

which also returns the correct subset.



来源:https://stackoverflow.com/questions/47890522/group-data-and-filter-groups-by-two-columns-dplyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!