问题
I am trying to use plyr but have difficulties in using several variables. Here an example.
df <- read.table(header=TRUE, text="
Firm Foreign SME Turnover
A1 N Y 200
A2 N N 1000
A3 Y Y 100
A1 N N 500
A2 Y Y 200
A3 Y Y 1000
A1 Y N 200
A2 N N 1000
A2 N Y 100
A2 N Y 200 ")
I am trying to create a table which summarize the Turnover on the two variables. Basically combining the following codes
t1 <- ddply(df, c('Firm', 'Foreign'), summarise,
BudgetForeign = sum(Turnover, na.rm = TRUE))
t2 <- ddply(df, c('Firm', 'SME'), summarise,
BudgetSME = sum(Turnover, na.rm = TRUE))
with following results
res <- read.table(header=TRUE, text="
Firm A1 A2 A3
BudgetForeign 200 200 1100
BudgetSME 200 500 1100")
res
How can I achieve this without doing multiple operations and subset and combine afterwards ?
Thanks in advance.
回答1:
I think you only want the values where Foreign or SME are 'Y'
... if that's the case. I would use melt
and dcast
from the reshape2
package rather than plyr
.
df.m <- melt(df, id.var=c('Firm', 'Turnover'))
dcast(df.m[df.m$value=='Y',], variable ~ Firm, value.var='Turnover', fun.aggregate=sum)
variable A1 A2 A3
1 Foreign 200 200 1100
2 SME 200 500 1100
If you want to see the differences between Y
and N
also you can add them to the formula in dcast
:
> dcast(df.m, variable + value ~ Firm, value.var='Turnover', fun.aggregate=sum)
variable value A1 A2 A3
1 Foreign N 700 2300 0
2 Foreign Y 200 200 1100
3 SME N 700 2000 0
4 SME Y 200 500 1100
>
回答2:
Thanks Justin. From your answer, the following code should solve my problem.
library(reshape2)
df.m <- melt(df, id.var=c('Firm', 'Turnover'))
x <- dcast(df.m, variable + value ~ Firm, value.var='Turnover', fun.aggregate=sum)
res <- rbind(
BudgetForeign = subset(x, variable == 'Foreign' & value == 'Y'),
BudgetSME = subset(x, variable == 'SME' & value == 'Y')
)
res
来源:https://stackoverflow.com/questions/11990830/using-multiple-variables-in-plyr