问题
I have a data set with 5 columns:
store_id year event item units
123 2015 sale_2 abc 2
234 2015 sale_3 def 1
345 2015 sale_2 xyz 5
I'm trying to rotate out the items by store_id, year, and event to get the sum. For instance
store_id year event abc def xyz
123 2015 sale_2 7 0 0
234 2015 sale_2 2 1 0
I'm having trouble figuring out the best method. Normally I'd use dummyVars in caret to do this but I need sums instead of flag. I've looked at tapply but it can't handle more than 2 grouping variables.
Any other suggestions?
回答1:
library(reshape2)
dcast(df, store_id + year + event ~ item, fun.aggregate = sum, value.var='units')
# store_id year event abc def xyz
# 1: 123 2015 sale_2 2 0 0
# 2: 234 2015 sale_3 0 1 0
# 3: 345 2015 sale_2 0 0 5
For large datasets consider
# uses dcast.data.table, much faster
library(data.table)
setDT(df)
dcast(df, store_id + year + event ~ item, fun.aggregate = sum, value.var='units')
回答2:
library(dplyr)
library(tidyr)
data %>%
group_by(store_id, year, event, item) %>%
summarize(N = sum(units)) %>%
spread(item, N)
You can use dplyr to group and summarize and tidyr to spread the data into the desired item columns.
来源:https://stackoverflow.com/questions/45150269/pivot-using-multiple-columns