Calculating subtotals in R

北城以北 提交于 2019-12-23 20:06:33


Name of member Allowance Type             Expenditure Type  Date          Amount, £

Adam Afriyie Office running costs (IEP/AOE) Incidentals     07/03/2009 111.09
Adam Afriyie Office running costs (IEP/AOE) Incidentals     11/05/2009 111.09
Adam Afriyie Office running costs (IEP/AOE) Incidentals     11/05/2009 51.75
Adam Holloway   Office running costs (IEP/AOE)  Incidentals  10/01/2009  35
Adam Holloway   Office running costs (IEP/AOE)  Incidentals  10/01/2009  413.23
Adam Holloway   Office running costs (IEP/AOE)  Incidentals  10/01/2009  9.55
Adam Holloway   Office running costs (IEP/AOE   IT equipment 07/03/2009 890.01
Adam Holloway   Communications Expenditure   Publications   12/04/2009  1774
Adam Holloway   Office running costs (IEP/AOE)  Incidentals  12/08/2009  1.1
Adam Holloway   Office running costs (IEP/AOE   Incidentals  12/08/2009  64.31
Adam Holloway   Office running costs (IEP/AOE)  Incidentals  12/08/2009  64.31

Hi im new to R and new to programming. This is a subset of the MP's expenses during a certain time period. I want to subtotal each MP's expenses and i used the code from another post

> aggregate(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data = foo, 
 +           FUN = sum)

and edited it to my own situation.

my code:

expenses2 <- aggregate(cbind(Amount..Â.) ~ Name.of.member, data = expenses, FUN = sum)

now although this code does do some sort of aggregation the numbers do not match up. for example one can calculate that Adam Afriyie's expenses are £273.93 however this code gives a result of 12697. I have no idea what this number represents. Can someone help me and tell me what im doing wrong??

Thank you in advance


Using only your name column and your last amount column:

df <- data.frame(name = c(rep("Adam Afriyie", 3), rep("Adam Holloway", 8)),
                 amount = c(111.09, 111.09, 51.75, 35,
                   413.23, 9.55, 890.01, 1774, 1.1, 64.31, 64.31)

version 1

aggregate(df$amount, by = list(name = df$name), FUN = "sum")

version 2

aggregate(amount ~ name, data = df, FUN = "sum")


1  Adam Afriyie  273.93
2  Adam Holloway 3251.51


I pulled that text into an editor. Then made valid header names and put back the tabs that had apparently been replaced with spaces and read into R getting this object:

    MPexp <- structure(list(Name_of_member = c("Adam Afriyie", "Adam Afriyie", 
    "Adam Afriyie", "Adam Holloway", "Adam Holloway", "Adam Holloway", 
    "Adam Holloway", "Adam Holloway", "Adam Holloway", "Adam Holloway", 
    "Adam Holloway"), Allowance_Type = c("Office running costs (IEP/AOE)", 
    "Office running costs (IEP/AOE)", "Office running costs (IEP/AOE)", 
    " Office running costs (IEP/AOE)", " Office running costs (IEP/AOE)", 
    " Office running costs (IEP/AOE)", " Office running costs (IEP/AOE", 
    " Communications Expenditure", " Office running costs (IEP/AOE)", 
    " Office running costs (IEP/AOE", " Office running costs (IEP/AOE)"
    ), Expenditure_Tyoe = c("Incidentals", "Incidentals", "Incidentals", 
    "Incidentals", "Incidentals", "Incidentals", "IT equipment", 
    "Publications", "Incidentals", "Incidentals", "Incidentals"), 
        Date = c("07/03/09", "11/05/09", "11/05/09", "10/01/09", 
        "10/01/09", "10/01/09", "07/03/09", "12/04/09", "12/08/09", 
        "12/08/09", "12/08/09"), Amount = c(111.09, 111.09, 51.75, 
        35, 413.23, 9.55, 890.01, 1774, 1.1, 64.31, 64.31)), .Names = c("Name_of_member", 
    "Allowance_Type", "Expenditure_Tyoe", "Date", "Amount"), 
class = "data.frame", row.names = c(NA, 

Now this should yield the expected result with aggregate:

> aggregate(MPexp$Amount, MPexp["Name_of_member"], sum)
  Name_of_member       x
1   Adam Afriyie  273.93
2  Adam Holloway 3251.51

Reading your question again made me realize that you were using aggregate.formula so this would also work on that data:

> aggregate(Amount ~ Name_of_member, data=MPexp, FUN=sum)
  Name_of_member  Amount
1   Adam Afriyie  273.93
2  Adam Holloway 3251.51


Another approach using plyr


#Using data from mropa's answer
> ddply(df, .(name), summarise, sum = sum(amount))
           name     sum
1  Adam Afriyie  273.93
2 Adam Holloway 3251.51

