R: nested grouped summaries with dplyr?

问题

I'm trying to practise the R dplyr package with a hypothetical dataset (link to pastebin) of people's drinking records at different bars:

bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE

I have used dplyr's group_by() and summarise() functions a couple times, but am not sure how to deal with more nested situations. Specifically, I wanna ask questions like:

For each unique bar_name, did each person order the exact same combination of drinks (drink_ordered)? In this dataset, this would be marked TRUE for the bars Moe's Tavern, Cheers, and Mos Eisley.
Even if each person ordered the exact same combination of drinks in a particular bar_name, did they order the drinks the same number of times (times_ordered)? For example, Moe's Tavern and Mos Eisley would me marked as TRUE for this question.
Then, even if each person ordered the exact same combination of drinks in a particular bar the same number of times, are their opinions (liked_it) of the drinks exactly the same? In this dataset that would be TRUE for Mos Eisley.

Observe that in the dataset there are cases (The Hip Joint) where the answer would be FALSE for all three questions, and there are missing values (Ten Forward).

Ideally, I hope to produce a table where the first column is bar_name, and three more boolean columns saying TRUE or FALSE for each of the three questions.

How do I efficiently achieve this with dplyr in R? Thank you very much.

回答1:

You can do:

DF %>%
  arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
  summarise(
    Ld   = toString(drink_ordered),
    Ldt  = paste(Ld, toString(times_ordered), sep="_"),
    Ldtl = paste(Ldt, toString(liked_it), sep="_")
  ) %>% 
  group_by(bar_name) %>% 
  summarise_each(funs(n_distinct)) %>%
  mutate_each(funs(. == 1), -person, -bar_name)

#        bar_name person    Ld   Ldt  Ldtl
#           (chr)  (int) (lgl) (lgl) (lgl)
# 1        Cheers      3  TRUE  TRUE FALSE
# 2  Moe’s Tavern      3  TRUE FALSE FALSE
# 3    Mos Eisley      3  TRUE  TRUE  TRUE
# 4   Ten Forward      3 FALSE FALSE FALSE
# 5 The Hip Joint      4 FALSE FALSE FALSE

来源：https://stackoverflow.com/questions/37034627/r-nested-grouped-summaries-with-dplyr

标签

dplyr

summarization