Merge rows with duplicate IDs

一世执手 提交于 2019-12-02 22:08:00

问题


I would like to merge and sum the values of each row that contains duplicated IDs.

For example, the data frame below contains a duplicated symbol 'LOC102723897'. I would like to merge these two rows and sum the value within each column, so that one row appears for the duplicated symbol.

> head(y$genes)
  SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19 SM20 SM21 SM22
1   32   29   23   20   27  105   80   64   83   80   94   58  122   76   78   70   34   32   45   42  138   30
2  246  568  437  343  304  291  542  457  608  433  218  329  483  376  410  296  550  533  537  473  296  382
3   30   23   30   13   20   18   23   13   31   11   15   27   36   21   23   25   26   27   37   27   31   16
4 1450 2716 2670 2919 2444 1668 2923 2318 3867 2084 1121 2175 3022 2308 2541 1613 2196 1851 2843 2078 2180 1902
5  288  366  327  334  314  267  550  410  642  475  219  414  679  420  425  308  359  406  550  398  399  268
6   34   59   62   68   42   31   49   45   62   51   40   32   30   39   41   75   54   59   83   99   37   37
  SM23 SM24 SM25 SM26 SM27 SM28 SM29 SM30       Symbol
1   41   23   57  160   84   67   87  113 LOC102723897
2  423  535  624  304  568  495  584  603    LINC01128
3   31   21   49   13   33   31   14   31    LINC00115
4 2453 3041 3590 2343 3450 3725 3336 3850        NOC2L
5  403  347  468  478  502  563  611  577 LOC102723897
6   45   51   56  107   79  105   92  131      PLEKHN1
> dim(y)
[1] 12928    30

I attempted using plyr to merge rows based on the 'Symbol' column, but it's not working.

> ddply(y$genes,"Symbol",numcolwise(sum))
> dim(y)
[1] 12928    30
> length(y$genes$Symbol)
[1] 12928
> length(unique(y$genes$Symbol))
[1] 12896

回答1:


You group-by on Symbol and sum all columns.

library(dplyr)
df %>% group_by(Symbol) %>% summarise_all(sum)

using data.table

library(data.table)
 setDT(df)[ , lapply(.SD, sum),by="Symbol"]



回答2:


We can just use aggregate from base R

aggregate(.~ Symbol, df, FUN = sum)


来源:https://stackoverflow.com/questions/41047877/merge-rows-with-duplicate-ids

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!