What is an efficient method for partitioning and aggregating intervals from timestamped rows in a data frame?

后端 未结 3 697
小鲜肉
小鲜肉 2020-12-03 09:07

From a data frame with timestamped rows (strptime results), what is the best method for aggregating statistics for intervals?

Intervals could be an hour, a day, et

3条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-03 09:54

    Standard functions to split vectors are cut and findInterval:

    v <- as.POSIXct(c(
      "2010-01-13 03:02:38 UTC",
      "2010-01-13 03:08:14 UTC",
      "2010-01-13 03:14:52 UTC",
      "2010-01-13 03:20:42 UTC",
      "2010-01-13 03:22:19 UTC"
    ))
    
    # Your function return list:
    interv(v, as.POSIXlt("2010-01-13 03:00:00 UTC"), 900)
    # [[1]]
    # [1] "2010-01-13 03:00:00"
    # [[2]]
    # [1] "2010-01-13 03:00:00"
    # [[3]]
    # [1] "2010-01-13 03:00:00"
    # [[4]]
    # [1] "2010-01-13 03:15:00 CET"
    # [[5]]
    # [1] "2010-01-13 03:15:00 CET"
    
    # cut returns factor, you must provide proper breaks:
    cut(v, as.POSIXlt("2010-01-13 03:00:00 UTC")+0:2*900)
    # [1] 2010-01-13 03:00:00 2010-01-13 03:00:00 2010-01-13 03:00:00
    # [4] 2010-01-13 03:15:00 2010-01-13 03:15:00
    # Levels: 2010-01-13 03:00:00 2010-01-13 03:15:00
    
    # findInterval returns vector of interval id (breaks like in cut)
    findInterval(v, as.POSIXlt("2010-01-13 03:00:00 UTC")+0:2*900)
    # [1] 1 1 1 2 2
    

    For the record: cut has a method for POSIXt type, but unfortunately there is no way to provide start argument, effect is:

    cut(v,"15 min")
    # [1] 2010-01-13 03:02:00 2010-01-13 03:02:00 2010-01-13 03:02:00
    # [4] 2010-01-13 03:17:00 2010-01-13 03:17:00
    # Levels: 2010-01-13 03:02:00 2010-01-13 03:17:00
    

    As you see it's start at 03:02:00. You could mess with labels of output factor (convert labels to time, round somehow and convert back to character).

提交回复
热议问题