问题
I have a question regarding filtering using the dplyr package in R.
I have a current dataframe as follows:
  url                       season    salary
   <fct>                     <fct>      <dbl>
 1 /players/a/abrinal01.html 2016-17  5725000
 2 /players/a/ackeral01.html 2008-09   711517
 3 /players/a/acyqu01.html   2012-13   788872
 4 /players/a/acyqu01.html   2013-14   915243
 5 /players/a/acyqu01.html   2014-15   981348
 6 /players/a/acyqu01.html   2015-16  1914544
 7 /players/a/acyqu01.html   2016-17  1709538
 8 /players/a/adamsjo01.html 2014-15  1404600
 9 /players/a/adamsst01.html 2014-15  3140517
10 /players/a/adamsst01.html 2016-17 22471910
11 /players/a/adamsst01.html 2017-18 2571910
I would like to group by URL and only keep those rows which contain URLs that played in seasons 2012-2013, 2013-2014 and 2014-2015 only.
I have tried this, but it gives an error :
Error in filter_impl(.data, quo) : Result must have length 1, not 3
p_filter <- p_g_stagger %>% 
  dplyr :: group_by(url) %>%
  dplyr :: filter(season == c('2012-13', '2013-14', '2014-15'))
My desired output is this:
       url                       season    salary
       <fct>                     <fct>      <dbl>
     1 /players/a/acyqu01.html   2012-13   788872
     2 /players/a/acyqu01.html   2013-14   915243
     3 /players/a/acyqu01.html   2014-15   981348
回答1:
We need two conditions in filter
1) Filters only the groups (url) which has all the season_needed 
2) Filters only the season_needed from those selected groups in condition 1.
season_needed <- c('2012-13', '2013-14', '2014-15')
library(dplyr)
df %>%
  group_by(url) %>%
  filter(all(season_needed %in% season) & season %in% season_needed)
#  url                     season  salary
#  <fct>                   <fct>    <int>
#1 /players/a/acyqu01.html 2012-13 788872
#2 /players/a/acyqu01.html 2013-14 915243
#3 /players/a/acyqu01.html 2014-15 981348
回答2:
another approach, usingadd_count.
seasons_in <- c('2012-13', '2013-14', '2014-15')
p_g_stagger %>% 
  filter(season %in% seasons_in) %>% 
  add_count(url, name = "nb_seasons") %>% 
  filter(nb_seasons == length(seasons_in)) %>% 
  select(-nb_seasons)
来源:https://stackoverflow.com/questions/55037179/filtering-with-a-set-of-values-in-r-dplyr