Group by columns and summarize a column into a list

问题

I have a dataframe like this:

sample_df<-data.frame(
   client=c('John', 'John','Mary','Mary'),
   date=c('2016-07-13','2016-07-13','2016-07-13','2016-07-13'),
   cluster=c('A','B','A','A'))

#sample data frame
   client date         cluster
1  John   2016-07-13    A 
2  John   2016-07-13    B 
3  Mary   2016-07-13    A 
4  Mary   2016-07-13    A

I would like to transform it into different format, which will be like:

#ideal data frame
   client date         cluster
1  John   2016-07-13    c('A,'B') 
2  Mary   2016-07-13    A

For the 'cluster' column, it will be a list if some client is belong to different cluster on the same date.

I thought I can do it with dplyr package with commend as below

library(dplyr)
ideal_df<-sample %>% 
    group_by(client, date) %>% 
    summarize( #some anonymous function)

However, I don't know how to write the anonymous function in this situation. Is there a way to transform the data into the ideal format?

回答1:

We can use toString to concat the unique elements in 'cluster' together after grouping by 'client'

r1 <- sample_df %>% 
         group_by(client, date) %>%
         summarise(cluster = toString(unique(cluster)))

Or another option would be to create a list column

r2 <- sample_df %>%
         group_by(client, date) %>% 
         summarise(cluster = list(unique(cluster)))

which we can unnest

library(tidyr)
r2 %>%
    ungroup %>%
     unnest()

来源：https://stackoverflow.com/questions/38348074/group-by-columns-and-summarize-a-column-into-a-list

标签

group-by

dplyr