R: Aggregate character strings [duplicate]

问题

I have a data frame ModelDF having columns with numeric as well as character values like:

Quantity        Type        Mode        Company
   1            Shoe        hello        Nike
   1            Shoe        hello        Nike
   2            Jeans       hello        Levis
   3            Shoe        hello        Nike
   1            Jeans       hello        Levis
   1            Shoe        hello        Adidas
   2            Jeans       hello        Spykar
   1            Shoe        ahola        Nike
   1            Jeans       ahola        Levis

I have to aggregate it in this form

Quantity        Type        Mode        Company
   5            Shoe        hello        Nike
   3            jeans       hello        Levis
   1            Shoe        hello        adidas
   2            jeans       hello        Spykar
   1            Shoe        ahola        Nike
   1            jeans       ahola        Levis

i.e. I have to aggregate and sum Quantity if all other columns are same.

I have tried it using aggregate but as it doesn't work on character values it is giving me wrong results.

What are my options? Thanks

回答1:

You don't want to 'aggregate strings', you want to aggregate numerics 'by' string variables. Here:

R> xx = data.frame(a=sample(letters[1:3], 10, TRUE),
                   b=sample(LETTERS[1:3], 10, TRUE),
                   c=runif(10))
R> xx
a b         c
1  b C 0.7094221
2  c B 0.2718095
3  c B 0.8844701
4  b C 0.9270141
5  b C 0.8243021
6  a A 0.3649902
7  a B 0.9763228
8  a A 0.8904676
9  b C 0.8640352
10 a A 0.7931683
R> aggregate(c ~ a + b, data=xx, FUN=sum)
a b         c
1 a A 2.0486261
2 a B 0.9763228
3 c B 1.1562796
4 b C 3.3247736

回答2:

aggregate(Quantity ~ Type + Mode + Company, df, sum)
#   Type  Mode Company Quantity
#1  Shoe hello  Adidas        1
#2 Jeans ahola   Levis        1
#3 Jeans hello   Levis        3
#4  Shoe ahola    Nike        1
#5  Shoe hello    Nike        5
#6 Jeans hello  Spykar        2

You can also try the data.table option:

setDT(df)[, .(Sum.Quantity = sum(Quantity)), by = list(Type, Mode, Company)]

#    Type  Mode Company Sum.Quantity
#1:  Shoe hello    Nike            5
#2: Jeans hello   Levis            3
#3:  Shoe hello  Adidas            1
#4: Jeans hello  Spykar            2
#5:  Shoe ahola    Nike            1
#6: Jeans ahola   Levis            1

Similarly with dplyr

df %>% 
  group_by(Type, Mode, Company) %>% 
               summarise(sum(Quantity))

DATA

dput(df)
structure(list(Quantity = c(1L, 1L, 2L, 3L, 1L, 1L, 2L, 1L, 1L
), Type = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Jeans", 
"Shoe"), class = "factor"), Mode = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L), .Label = c("ahola", "hello"), class = "factor"), 
    Company = structure(c(3L, 3L, 2L, 3L, 2L, 1L, 4L, 3L, 2L), .Label = c("Adidas", 
    "Levis", "Nike", "Spykar"), class = "factor")), .Names = c("Quantity", 
"Type", "Mode", "Company"), class = "data.frame", row.names = c(NA, 
-9L))

来源：https://stackoverflow.com/questions/37180479/r-aggregate-character-strings

标签

aggregate