i have a data frame with 2 groups 1 timevariable and an dependent variable. e.g.:
name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)
df <- data.frame(name, class, year, value)
df
and would like to apply the "diff" function along each combination off "class" and "name".
My desired output should look something like this:
name class year value.1
1 a c1 2010 -67
2 a c1 2009 47
3 b c1 2010 -10
4 b c1 2009 20
...
I tried
aggregate(value~name + class, data=df, FUN="diff")
which does not yield the solution i'm looking for in a large dataset. Thank you very much in advance!
Sebatian
The plyr
package is going to be your friend. The function ddply
takes a data.frame
, applies a function for each defined subset, then returns a data.frame
of all the recombined pieces.
The simplest solution is to use summarize
and diff(value)
for each combination of .(class, name)
:
library(plyr)
ddply(df, .(class, name), summarize, diff(value))
class name ..1
1 c1 a -67
2 c1 a 47
3 c1 b -10
4 c1 b 20
5 c2 a -10
6 c2 a 20
7 c2 b -10
8 c2 b -10
9 c3 a -10
10 c3 a -10
11 c3 b -19
12 c3 b 20
To get your years in the results, it's a little bit more involved:
ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
class name year value
1 c1 a 2010 -67
2 c1 a 2009 47
3 c1 b 2010 -10
4 c1 b 2009 20
5 c2 a 2010 -10
6 c2 a 2009 20
7 c2 b 2010 -10
8 c2 b 2009 -10
9 c3 a 2010 -10
10 c3 a 2009 -10
11 c3 b 2010 -19
12 c3 b 2009 20
来源:https://stackoverflow.com/questions/8254508/function-diff-over-various-groups-in-r