问题
I am having trouble grouping and summing the follwing data in R:
category freq
1 C1 9
2 C2 39
3 C3 3
4 A1 38
5 A2 2
6 A3 29
7 B1 377
8 B2 214
9 B3 790
10 B4 724
11 D1 551
12 D2 985
13 E5 19
14 E4 28
to look like this:
category freq
1 A 69
2 B 2105
3 C 51
4 D 1536
5 E 47
I usually use ddply to aggregate data by an attribute but this just adds all values rows with the same attribute in a given column. I need to be able to specify multiple attributes that should be lumped into one category.
回答1:
Why don't you add a column to your dataframe, that would be the letter part of your "Category" column. Then, you could use ddply
.
Example:
df = data.frame(id = c(1,2,3,4,5), category = c("AB1", "AB2", "B1", "B2", "B3"), freq = c(50,51,2,26))
df$new = as.factor(gsub("\\d", "", df$category))
You could then use ddply
based on the new column, as follows:
library(plyr)
aggregate <- ddply(df, .(new), summarize, freq = sum(freq))
You get the following result:
# new freq
#1 AB 101
#2 B 31
This would work only if you intend to group all the categories with similar "alphabetical" substring under the same umbrella category.
If, HOWEVER, you wish to group custom categories under one category, (your example: KG, XM and L4 would be part of the same category), you could define new "super" categories, and assign each sub-category to the appropriate "super" category. One way that I can think of is the switch
function. Please see example below:
df = data.frame(id = c(1,2,3,4,5), category = c("A", "B", "KG", "XM", "L4"), freq = c(50,51,3,2,26))
fct <- function(cat) {switch(cat, "A" = "CAT1", "B" = "CAT2", "KG" = "CAT3", "XM" = "CAT3", "L4"="CAT3")}
df$new = as.factor(unlist(lapply(df$category, fct)))
aggregate <- ddply(df, .(new), summarize, freq = sum(freq))
This will give you:
# new freq
#1 CAT1 50
#2 CAT2 51
#3 CAT3 31
来源:https://stackoverflow.com/questions/18366387/r-aggregate-data-by-defining-grouping