R: How to combine duplicated rows from multiple columns based on unique values in a single column and merge those unique values by |?

安稳与你 提交于 2020-01-06 04:31:51

问题


I have the following data frame:

gene    gene_name   source  chromosome  details
1       a           A           2       01; xyz
1       a           A           2       02; ijk
2       b           B           3       03; efg
2       b           C           3       03; efg
3       c           D           4       04; lmn
3       c           D           4       05; opq
3       c           D           4       06; rst
4       NA          10          6       NA
4       NA          11          6       NA

I want to get the following output:

gene    gene_name   source  chromosome  details
1       a           A       2           01; xyz | 02;ijk
2       b           B, C    3           03; efg
3       c           D       4           04; lmn | 05; opq | 06; rst
4       NA          10, 11  6           NA | NA

I have tried to use aggregate() and group_by() in different ways, but did not get it.

Please guide.

Thanks.


回答1:


This should work:

df %>%
  group_by(gene, gene_name, source, chromosome) %>%
  summarise(details = paste(details, collapse = " | "))

I ran the below on iris and got a result similar to as you described

iris %>%
  group_by(Sepal.Length, Sepal.Width, Petal.Length, Species) %>%
  summarise(Petal.Width = paste(Petal.Width, collapse = " | "))


来源:https://stackoverflow.com/questions/58958102/r-how-to-combine-duplicated-rows-from-multiple-columns-based-on-unique-values-i

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!