问题
I'm trying to practise the R dplyr
package with a hypothetical dataset (link to pastebin) of people's drinking records at different bars:
bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE
I have used dplyr's group_by()
and summarise()
functions a couple times, but am not sure how to deal with more nested situations. Specifically, I wanna ask questions like:
For each unique
bar_name
, did eachperson
order the exact same combination of drinks (drink_ordered
)? In this dataset, this would be markedTRUE
for the bars Moe's Tavern, Cheers, and Mos Eisley.Even if each
person
ordered the exact same combination of drinks in a particularbar_name
, did they order the drinks the same number of times (times_ordered
)? For example, Moe's Tavern and Mos Eisley would me marked asTRUE
for this question.Then, even if each
person
ordered the exact same combination of drinks in a particular bar the same number of times, are their opinions (liked_it
) of the drinks exactly the same? In this dataset that would beTRUE
for Mos Eisley.
Observe that in the dataset there are cases (The Hip Joint) where the answer would be FALSE
for all three questions, and there are missing values (Ten Forward).
Ideally, I hope to produce a table where the first column is bar_name
, and three more boolean columns saying TRUE
or FALSE
for each of the three questions.
How do I efficiently achieve this with dplyr
in R? Thank you very much.
回答1:
You can do:
DF %>%
arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
summarise(
Ld = toString(drink_ordered),
Ldt = paste(Ld, toString(times_ordered), sep="_"),
Ldtl = paste(Ldt, toString(liked_it), sep="_")
) %>%
group_by(bar_name) %>%
summarise_each(funs(n_distinct)) %>%
mutate_each(funs(. == 1), -person, -bar_name)
# bar_name person Ld Ldt Ldtl
# (chr) (int) (lgl) (lgl) (lgl)
# 1 Cheers 3 TRUE TRUE FALSE
# 2 Moe’s Tavern 3 TRUE FALSE FALSE
# 3 Mos Eisley 3 TRUE TRUE TRUE
# 4 Ten Forward 3 FALSE FALSE FALSE
# 5 The Hip Joint 4 FALSE FALSE FALSE
来源:https://stackoverflow.com/questions/37034627/r-nested-grouped-summaries-with-dplyr