Consolidate duplicate rows

后端未结

关注

 6  1908

I have a data frame where one column is species\' names, and the second column is abundance values. Due to the sampling procedure, some species appear more than once (i.e.,

相关标签:

6条回答

礼貌的吻别

2020-12-01 02:21

A MWE to verify whether a formula to respect a second variable (i.e., here "Z" and in addition to "X", would actually work:

example = data.frame(X=c("x"),Z=c("a"),Y=c(1), stringsAsFactors=F)
newrow = c("y","b",1)
example <- rbind(example, newrow)
newrow = c("z","a",0.5)
example <- rbind(example, newrow)
newrow = c("x","b",1)
example <- rbind(example, newrow)
newrow = c("x","b",2)
example <- rbind(example, newrow)
newrow = c("y","b",10)
example <- rbind(example, newrow)
example$X = as.factor(example$X)
example$Z = as.factor(example$Z)
example$Y = as.numeric(example$Y)
example_agg <- aggregate(Y~X+Z,data=example,FUN=sum)

0 讨论(0)

借酒劲吻你

2020-12-01 02:28
A dplyr solution:
```
library(dplyr)
df %>% group_by(x) %>% summarise(y = sum(y))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2020-12-01 02:28

> tapply(df$y, df$x, sum)
sp1 sp2 sp3 sp4 
  2   9   7   3

if it has to be a data.frame Ben's answer works great. or you can coerce the tapply output.

out <- tapply(df$y, df$x, sum)
>     data.frame(x=names(out), y=out, row.names=NULL)
    x y
1 sp1 2
2 sp2 9
3 sp3 7
4 sp4 3

0 讨论(0)

眼角桃花

2020-12-01 02:30
A data.table solution for time and memory efficiency
```
library(data.table)
DT <- as.data.table(df)
# which columns are numeric 
numeric_cols <- which(sapply(DT, is.numeric))
DT[, lapply(.SD, sum), by = x, .SDcols = numeric_cols]
```
Or, in your case, given that you know that there is only the 1 column y you wish to sum over
```
DT[, list(y=sum(y)),by=x]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-01 02:31
This works:
```
library(plyr)
ddply(df,"x",numcolwise(sum))
```
in words: (1) split the data frame df by the "x" column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd in ddply stands for "take a d ata frame as input, return a d ata frame")

Another, possibly clearer, approach:
```
aggregate(y~x,data=df,FUN=sum)
```
See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-01 02:43
Simple as aggregate:
```
aggregate(df['y'], by=df['x'], sum)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...