Collapse / concatenate / aggregate multiple columns to a single comma separated string within each group

╄→гoц情女王★ 提交于 2021-02-13 17:04:11

问题


This is an extension to post Collapse / concatenate / aggregate a column to a single comma separated string within each group

Goal: aggregate multiple columns according to one grouping variable and separate individual values by separator of choice.

Reproducible example:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200,3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))
data
    A   B  C  D  E
1 111 100  1 15  1
2 111 100  2 16 NA
3 111 100 NA 17 NA
4 222 200  1 18  1
5 222 200  2 19 NA
6 222 200 NA 20 NA

A is the grouping variable but B is still displayed in overall result (B depends on A in my application) and C, D and E are the variables to be collapsed into separated character strings.

Desired Output

    A   B  C    D         E
1 111 100  1,2  15,16,17  1
2 222 100  1,2  18,19,20  1    

I don't have a ton of experience with R. I did try to expand upon the solutions posted by G. Grothendieck to the linked post to meet my requirements but can't quite get it right for multiple columns.

What would be a proper implementation to get the desired output?

I focused specifically on group_by and summarise_all and aggregate in my attempts. They are a complete mess so I don't believe it would even be helpful to display.

EDIT: Solutions posted work great at displaying desired result! To continue improving the value in this post for those that find it.

How would it be possible for users to select their own separation characters. e.g. '-', '\n' The current solutions by @akrun and @tmfmnk both result in lists instead of a concatenated character string. Please correct me if I said this incorrectly.

data$D
[1] 15 16 17 18 19 20
> data$A
[1] 111 111 111 222 222 222
> data$B
[1] 100 100 100 200 200 200
> data$C
[1]  1  2 NA  1  2 NA
> data$D
[1] 15 16 17 18 19 20
> data$E
[1]  1 NA NA  1 NA NA

回答1:


We can group by 'A', 'B', and use summarise_at to paste all the non-NA elements

library(dplyr)
data %>% 
    group_by(A, B) %>%
    summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))
# A tibble: 2 x 5
# Groups:   A [2]
#      A     B C     D          E    
#  <dbl> <dbl> <chr> <chr>      <chr>
#1   111   100 1, 2  15, 16, 17 1    
#2   222   200 1, 2  18, 19, 20 1   

If we need to pass custom delimiter, use paste or str_c

library(stringr)
data %>% 
    group_by(A, B) %>%
    summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))

Or using base R with aggregate

aggregate(. ~ A + B, data, FUN = function(x) 
      toString(x[!is.na(x)]), na.action = NULL)



回答2:


With dplyr, you can do:

data %>%
 group_by(A, B) %>%
 summarise_all(~ toString(na.omit(.)))

      A     B C     D          E    
  <dbl> <dbl> <chr> <chr>      <chr>
1   111   100 1, 2  15, 16, 17 1    
2   222   200 1, 2  18, 19, 20 1 


来源:https://stackoverflow.com/questions/60233119/collapse-concatenate-aggregate-multiple-columns-to-a-single-comma-separated

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!