问题
I have a data frame ModelDF
having columns with numeric as well as character values like:
Quantity Type Mode Company
1 Shoe hello Nike
1 Shoe hello Nike
2 Jeans hello Levis
3 Shoe hello Nike
1 Jeans hello Levis
1 Shoe hello Adidas
2 Jeans hello Spykar
1 Shoe ahola Nike
1 Jeans ahola Levis
I have to aggregate it in this form
Quantity Type Mode Company
5 Shoe hello Nike
3 jeans hello Levis
1 Shoe hello adidas
2 jeans hello Spykar
1 Shoe ahola Nike
1 jeans ahola Levis
i.e. I have to aggregate and sum Quantity if all other columns are same.
I have tried it using aggregate
but as it doesn't work on character values it is giving me wrong results.
What are my options? Thanks
回答1:
You don't want to 'aggregate strings', you want to aggregate numerics 'by' string variables. Here:
R> xx = data.frame(a=sample(letters[1:3], 10, TRUE),
b=sample(LETTERS[1:3], 10, TRUE),
c=runif(10))
R> xx
a b c
1 b C 0.7094221
2 c B 0.2718095
3 c B 0.8844701
4 b C 0.9270141
5 b C 0.8243021
6 a A 0.3649902
7 a B 0.9763228
8 a A 0.8904676
9 b C 0.8640352
10 a A 0.7931683
R> aggregate(c ~ a + b, data=xx, FUN=sum)
a b c
1 a A 2.0486261
2 a B 0.9763228
3 c B 1.1562796
4 b C 3.3247736
回答2:
aggregate(Quantity ~ Type + Mode + Company, df, sum)
# Type Mode Company Quantity
#1 Shoe hello Adidas 1
#2 Jeans ahola Levis 1
#3 Jeans hello Levis 3
#4 Shoe ahola Nike 1
#5 Shoe hello Nike 5
#6 Jeans hello Spykar 2
You can also try the data.table
option:
setDT(df)[, .(Sum.Quantity = sum(Quantity)), by = list(Type, Mode, Company)]
# Type Mode Company Sum.Quantity
#1: Shoe hello Nike 5
#2: Jeans hello Levis 3
#3: Shoe hello Adidas 1
#4: Jeans hello Spykar 2
#5: Shoe ahola Nike 1
#6: Jeans ahola Levis 1
Similarly with dplyr
df %>%
group_by(Type, Mode, Company) %>%
summarise(sum(Quantity))
DATA
dput(df)
structure(list(Quantity = c(1L, 1L, 2L, 3L, 1L, 1L, 2L, 1L, 1L
), Type = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Jeans",
"Shoe"), class = "factor"), Mode = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L), .Label = c("ahola", "hello"), class = "factor"),
Company = structure(c(3L, 3L, 2L, 3L, 2L, 1L, 4L, 3L, 2L), .Label = c("Adidas",
"Levis", "Nike", "Spykar"), class = "factor")), .Names = c("Quantity",
"Type", "Mode", "Company"), class = "data.frame", row.names = c(NA,
-9L))
来源:https://stackoverflow.com/questions/37180479/r-aggregate-character-strings