How to group time by every n minutes in R

為{幸葍}努か 提交于 2019-12-11 12:26:37

问题


I have a dataframe with a lot of time series:

1   0:03    B   1
2   0:05    A   1
3   0:05    A   1
4   0:05    B   1
5   0:10    A   1
6   0:10    B   1
7   0:14    B   1
8   0:18    A   1
9   0:20    A   1
10  0:23    B   1
11  0:30    A   1

I want to group the time series into every 6 minutes and count the frequency of A and B:

1   0:06    A   2
2   0:06    B   2
3   0:12    A   1
4   0:12    B   1
5   0:18    A   1
6   0:24    A   1
7   0:24    B   1
8   0:18    A   1
9   0:30    A   1

Also, the class of the time series is character. What should I do?


回答1:


Here's an approach to convert times to POSIXct, cut the times by 6 minute intervals, then count.

First, you need to specify the year, month, day, hour, minute, and seconds of your data. This will help with scaling it to larger datasets.

library(tidyverse)
library(lubridate)

# sample data
d <- data.frame(t = paste0("2019-06-02 ", 
                           c("0:03","0:06","0:09","0:12","0:15",
                             "0:18","0:21","0:24","0:27","0:30"), 
                           ":00"),
                g = c("A","A","B","B","B"))

d$t <- ymd_hms(d$t) # convert to POSIXct with `lubridate::ymd_hms()`

If you check the class of your new date column, you will see it is "POSIXct".

> class(d$t)
[1] "POSIXct" "POSIXt" 

Now that the data is in "POSIXct", you can cut it by minute intervals! We will add this new grouping factor as a new column called tc.

d$tc <- cut(d$t, breaks = "6 min")  
d
                     t g                  tc
1  2019-06-02 00:03:00 A 2019-06-02 00:03:00
2  2019-06-02 00:06:00 A 2019-06-02 00:03:00
3  2019-06-02 00:09:00 B 2019-06-02 00:09:00
4  2019-06-02 00:12:00 B 2019-06-02 00:09:00
5  2019-06-02 00:15:00 B 2019-06-02 00:15:00
6  2019-06-02 00:18:00 A 2019-06-02 00:15:00
7  2019-06-02 00:21:00 A 2019-06-02 00:21:00
8  2019-06-02 00:24:00 B 2019-06-02 00:21:00
9  2019-06-02 00:27:00 B 2019-06-02 00:27:00
10 2019-06-02 00:30:00 B 2019-06-02 00:27:00

Now you can group_by this new interval (tc) and your grouping column (g), and count the frequency of occurences. Getting the frequency of observations in a group is a fairly common operation, so dplyr provides count for this:

count(d, g, tc)
# A tibble: 7 x 3
  g     tc                      n
  <fct> <fct>               <int>
1 A     2019-06-02 00:03:00     2
2 A     2019-06-02 00:15:00     1
3 A     2019-06-02 00:21:00     1
4 B     2019-06-02 00:09:00     2
5 B     2019-06-02 00:15:00     1
6 B     2019-06-02 00:21:00     1
7 B     2019-06-02 00:27:00     2

If you run ?dplyr::count() in the console, you'll see that count(d, tc) is simply a wrapper for group_by(d, g, tc) %>% summarise(n = n()).




回答2:


According to the sample dataset, the time series is given as time-of-day, i.e., without date.

The data.table package has the ITime class which is a time-of-day class stored as the integer number of seconds in the day. With data.table, we can use a rolling join to map times to the upper limit of the 6 minutes intervals (right-closed intervals):

library(data.table)

# coerce from character to class ITime
setDT(ts)[, time := as.ITime(time)]

# create sequence of breaks
breaks <- as.ITime(seq(as.ITime("0:00"), as.ITime("23:59:59"), as.ITime("0:06")))

# rolling join and aggregate
ts[, CJ(breaks, group, unique = TRUE)
   ][ts, on = .(group, breaks = time), roll = -Inf, .(x.breaks, group)
     ][, .N, by = .(upper = x.breaks, group)]

which returns

      upper group N
1: 00:06:00     B 2
2: 00:06:00     A 2
3: 00:12:00     A 1
4: 00:12:00     B 1
5: 00:18:00     B 1
6: 00:18:00     A 1
7: 00:24:00     A 1
8: 00:24:00     B 1
9: 00:30:00     A 1

Addendum

If the direction of the rolling join is changed (roll = +Inf instead of roll = -Inf) we get left-closed intervals

ts[, CJ(breaks, group, unique = TRUE)
   ][ts, on = .(group, breaks = time), roll = +Inf, .(x.breaks, group)
     ][, .N, by = .(lower = x.breaks, group)]

which changes the result significantly:

      lower group N
1: 00:00:00     B 2
2: 00:00:00     A 2
3: 00:06:00     A 1
4: 00:06:00     B 1
5: 00:12:00     B 1
6: 00:18:00     A 2
7: 00:18:00     B 1
8: 00:30:00     A 1

Data

library(data.table)
ts <- fread("
1   0:03    B   1
2   0:05    A   1
3   0:05    A   1
4   0:05    B   1
5   0:10    A   1
6   0:10    B   1
7   0:14    B   1
8   0:18    A   1
9   0:20    A   1
10  0:23    B   1
11  0:30    A   1"
, header = FALSE
, col.names = c("rn", "time", "group", "value"))


来源:https://stackoverflow.com/questions/56451761/how-to-group-time-by-every-n-minutes-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!