how to calculate the average difference between dates by ID in R

前端 未结 1 852
情歌与酒
情歌与酒 2021-01-28 17:08

I have the data set like below and i want to calculate the average time difference for each unique id

data:
   membership_id created_date 
1       12000000 2015         


        
相关标签:
1条回答
  • 2021-01-28 17:26

    Coming from plyr, you can probably transition very easily to dplyr. It won't be quite as fast as data table, but it will be much faster than ddply.

    dat %>% group_by(membership_id) %>%
        arrange(created_date) %>%
        summarize(avg = as.numeric(mean(diff(created_date))))
    # Source: local data frame [3 x 2]
    #
    #   membership_id   avg
    #           (int) (dbl)
    # 1      12000000   555
    # 2      12000001   262
    # 3      12000003   391
    

    Without any more real effort, you can speed things up even more by converting to a data.table object but still use the dplyr commands. Pure data.table will still be even faster.

    (Using this data)

    dat = structure(list(membership_id = c(12000000L, 12000001L, 12000001L, 
    12000001L, 12000001L, 12000003L, 12000003L, 12000000L, 12000000L
    ), created_date = structure(c(16455, 15663, 15985, 16135, 16449, 
    15744, 16135, 16106, 15345), class = "Date")), .Names = c("membership_id", 
    "created_date"), row.names = c("1", "2", "3", "4", "5", "6", 
    "7", "8", "9"), class = "data.frame")
    
    0 讨论(0)
提交回复
热议问题