I have the data set like below and i want to calculate the average time difference for each unique id
data:
membership_id created_date
1 12000000 2015
Coming from plyr
, you can probably transition very easily to dplyr
. It won't be quite as fast as data table, but it will be much faster than ddply
.
dat %>% group_by(membership_id) %>%
arrange(created_date) %>%
summarize(avg = as.numeric(mean(diff(created_date))))
# Source: local data frame [3 x 2]
#
# membership_id avg
# (int) (dbl)
# 1 12000000 555
# 2 12000001 262
# 3 12000003 391
Without any more real effort, you can speed things up even more by converting to a data.table
object but still use the dplyr
commands. Pure data.table
will still be even faster.
(Using this data)
dat = structure(list(membership_id = c(12000000L, 12000001L, 12000001L,
12000001L, 12000001L, 12000003L, 12000003L, 12000000L, 12000000L
), created_date = structure(c(16455, 15663, 15985, 16135, 16449,
15744, 16135, 16106, 15345), class = "Date")), .Names = c("membership_id",
"created_date"), row.names = c("1", "2", "3", "4", "5", "6",
"7", "8", "9"), class = "data.frame")