I have a data frame where one column is species\' names, and the second column is abundance values. Due to the sampling procedure, some species appear more than once (i.e.,
A MWE to verify whether a formula to respect a second variable (i.e., here "Z" and in addition to "X", would actually work:
example = data.frame(X=c("x"),Z=c("a"),Y=c(1), stringsAsFactors=F)
newrow = c("y","b",1)
example <- rbind(example, newrow)
newrow = c("z","a",0.5)
example <- rbind(example, newrow)
newrow = c("x","b",1)
example <- rbind(example, newrow)
newrow = c("x","b",2)
example <- rbind(example, newrow)
newrow = c("y","b",10)
example <- rbind(example, newrow)
example$X = as.factor(example$X)
example$Z = as.factor(example$Z)
example$Y = as.numeric(example$Y)
example_agg <- aggregate(Y~X+Z,data=example,FUN=sum)
A dplyr
solution:
library(dplyr)
df %>% group_by(x) %>% summarise(y = sum(y))
> tapply(df$y, df$x, sum)
sp1 sp2 sp3 sp4
2 9 7 3
if it has to be a data.frame
Ben's answer works great. or you can coerce the tapply output.
out <- tapply(df$y, df$x, sum)
> data.frame(x=names(out), y=out, row.names=NULL)
x y
1 sp1 2
2 sp2 9
3 sp3 7
4 sp4 3
A data.table
solution for time and memory efficiency
library(data.table)
DT <- as.data.table(df)
# which columns are numeric
numeric_cols <- which(sapply(DT, is.numeric))
DT[, lapply(.SD, sum), by = x, .SDcols = numeric_cols]
Or, in your case, given that you know that there is only the 1 column y
you wish to sum over
DT[, list(y=sum(y)),by=x]
This works:
library(plyr)
ddply(df,"x",numcolwise(sum))
in words: (1) split the data frame df
by the "x"
column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd
in ddply
stands for "take a d ata frame as input, return a d ata frame")
Another, possibly clearer, approach:
aggregate(y~x,data=df,FUN=sum)
See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.
Simple as aggregate
:
aggregate(df['y'], by=df['x'], sum)