Aggregate daily level data to weekly level in R

前端 未结 5 745
无人及你
无人及你 2021-01-03 16:55

I have a huge dataset similar to the following reproducible sample data.

   Interval    value
1  2012-06-10   552
2  2012-06-11  4850
3  2012-06-12  4642
4          


        
5条回答
  •  轮回少年
    2021-01-03 17:30

    I just came across this old question because it was used as a dupe target.

    Unfortunately, all the upvoted answers (except the one by konvas and a now deleted one) present solutions for aggregating the data by week of the year while the OP has requested to aggregate by week of the month.

    The definition of week of the year and week of the month is ambiguous as discussed here, here, and here.

    However, the OP has indicated that he wants to count the days 1 to 7 of each month as week 1 of the month, days 8 to 14 as week 2 of the month, etc. Note that week 5 is a stub for most of the months consisting of only 2 or 3 days (except for the month of February if no leap year).

    Having prepared the ground, here is a data.table solution for this kind of aggregation:

    library(data.table)
    DT[, .(value = sum(value)), 
           by = .(Interval = sprintf("Week %i, %s", 
                                     (mday(Interval) - 1L) %/% 7L + 1L, 
                                     format(Interval, "%b %Y")))]
    
               Interval value
    1: Week 2, Jun 2012 18366
    2: Week 3, Jun 2012 24104
    3: Week 4, Jun 2012 23348
    4: Week 5, Jun 2012  5204
    5: Week 1, Jul 2012 23579
    6: Week 2, Jul 2012 11573
    

    We can verify that we have picked the correct intervals by

    DT[, .(value = sum(value),
           date_range = toString(range(Interval))), 
       by = .(Week = sprintf("Week %i, %s", 
                                 (mday(Interval) -1L) %/% 7L + 1L, 
                                 format(Interval, "%b %Y")))]
    
                   Week value             date_range
    1: Week 2, Jun 2012 18366 2012-06-10, 2012-06-14
    2: Week 3, Jun 2012 24104 2012-06-15, 2012-06-21
    3: Week 4, Jun 2012 23348 2012-06-22, 2012-06-28
    4: Week 5, Jun 2012  5204 2012-06-29, 2012-06-30
    5: Week 1, Jul 2012 23579 2012-07-01, 2012-07-07
    6: Week 2, Jul 2012 11573 2012-07-08, 2012-07-10
    

    which is in line with OP's specification.

    Data

    library(data.table)
    DT <- fread(
      "rn   Interval    value
      1  2012-06-10   552
      2  2012-06-11  4850
      3  2012-06-12  4642
      4  2012-06-13  4132
      5  2012-06-14  4190
      6  2012-06-15  4186
      7  2012-06-16  1139
      8  2012-06-17   490
      9  2012-06-18  5156
      10 2012-06-19  4430
      11 2012-06-20  4447
      12 2012-06-21  4256
      13 2012-06-22  3856
      14 2012-06-23  1163
      15 2012-06-24   564
      16 2012-06-25  4866
      17 2012-06-26  4421
      18 2012-06-27  4206
      19 2012-06-28  4272
      20 2012-06-29  3993
      21 2012-06-30  1211
      22 2012-07-01   698
      23 2012-07-02  5770
      24 2012-07-03  5103
      25 2012-07-04   775
      26 2012-07-05  5140
      27 2012-07-06  4868
      28 2012-07-07  1225
      29 2012-07-08   671
      30 2012-07-09  5726
      31 2012-07-10  5176", drop = 1L)
    DT[, Interval := as.Date(Interval)]
    

提交回复
热议问题