Checking duplicates, sum them and delete one row after summing

跟風遠走 提交于 2019-12-04 12:32:15

问题


I have a dataframe which contains some duplicates. I want to sum rows of two columns where there is a duplicate and then delete the unwanted row.

Here is an example of the data,

Year    ID  Lats     Longs      N   n   c_id
2015    200 30.5417 -20.5254    150 30  4142
2015    200 30.5417 -20.5254    90  50  4142

I want to sum columns N and n into one row. the rest of the information i.e. Lats , Longs , ID and Year is to remain the same e.g.,

Year    ID  Lats    Long        N   n   c_id
2015    200 30.5417 -20.5254    240 80  4142

回答1:


Solution using data.table:

require(data.table)
df <- structure(list(year = c(2015, 2015), ID = c(200, 200), Lats = c(30.5417, 
            30.5417), Longs = c(-20.5254, -20.5254), N = c(150, 90), n = c(30, 
            50), c_id = c(4142, 4142)), .Names = c("year", "ID", "Lats", 
            "Longs", "N", "n", "c_id"), row.names = c(NA, -2L), 
            class = "data.frame")
dt <- data.table(df)
dt[, lapply(.SD, sum), by="c_id,year,ID,Lats,Longs"]

   c_id year  ID    Lats    Longs   N  n
1: 4142 2015 200 30.5417 -20.5254  240 80

Solution using plyr:

require(plyr)
ddply(df, .(c_id, year, ID, Lats, Longs), function(x) c(N=sum(x$N), n=sum(x$n)))

  c_id year  ID    Lats    Longs   N  n
1 4142 2015 200 30.5417 -20.5254 240 80


来源:https://stackoverflow.com/questions/14152971/checking-duplicates-sum-them-and-delete-one-row-after-summing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!