R finding duplicates in one column and collapsing in a second column

两盒软妹~` 提交于 2019-11-29 08:44:24

You can use tapply in base R

data.frame(probes=unique(olap$probes), 
           genes=tapply(olap$genes, olap$probes, paste, collapse=" "))

or use plyr:

library(plyr)
ddply(olap, "probes", summarize, genes = paste(genes, collapse=" "))

UPDATE

It's probably safer in the first version to do this:

tmp <- tapply(olap$genes, olap$probes, paste, collapse=" ")
data.frame(probes=names(tmp), genes=tmp)

Just in case unique gives the probes in a different order to tapply. Personally I would always use ddply.

Base R aggregate() should work fine for this:

aggregate(genes ~ probes, data = olap, as.vector)
#       probes            genes
# 1 cg00050873            TSPY4
# 2 cg00061679 DAZ1, DAZ4, DAZ4

I prefer as.vector in case I need to do any further work on the data (this stores the genes column as a list, but you can also try aggregate(genes ~ probes, data=test, paste, collapse=" ") if you prefer it to be a character string.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!