In R, how do I split timestamp interval data into regular slots?

后端 未结 1 700
广开言路
广开言路 2020-12-11 18:23

I\'m working with data describing events having start and end time. For instance, it could be in a form of:

I\'d like to convert this data to a form where

相关标签:
1条回答
  • 2020-12-11 18:45

    You might also approach this by thinking of each start_time as adding one active event and each end_time as reducing active events by one. This approach lets you identify the active events at any given instant, and it scales well. (I've used something similar to count millions of events and it's basically instantaneous.)

    df2 <- df1 %>%
      gather(type, time, start_date:end_date) %>%
      mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
      arrange(time) %>%
      mutate(active_events = cumsum(event_chg))
    
    df2
    # A tibble: 4 x 5
    #     id type       time                event_chg active_events
    #  <dbl> <chr>      <dttm>                  <dbl>         <dbl>
    #1     2 start_date 2018-12-10 13:29:37         1             1
    #2     2 end_date   2018-12-10 14:02:37        -1             0
    #3     1 start_date 2018-12-10 14:45:51         1             1
    #4     1 end_date   2018-12-10 14:59:04        -1             0
    
    ggplot(df2, aes(time, active_events)) + geom_step()
    

    If you want to also assess the active count at regular intervals, you could integrate those intervals into your output data frame like this:

    df2b <- df1 %>%
      gather(type, time, start_date:end_date) %>%
      mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
      #  NEW SECTION HERE
      bind_rows(data_frame(type = "marker",
                   time = seq.POSIXt(ymd_h(2018121013, tz = "Australia/Brisbane"), 
                                     ymd_h(2018121016, tz = "Australia/Brisbane"), 
                                     by  = 15*60), # 15 minutes of seconds = 15*60
                   event_chg = 0)) %>% 
      #  END OF NEW SECTION
      arrange(time) %>%
      mutate(active_events = cumsum(event_chg))
    

    Then it's possible to plot those counts directly, or filter the output data frame to see them. In this case, event id 1 occurred entirely between two 15-minute intervals.

    ggplot(df2b, aes(time, active_events, label = active_events)) + 
      geom_step() +
      geom_point(data = df2b %>% filter(type == "marker")) +
      geom_text(data = df2b %>% filter(type == "marker"), vjust = -0.5)
    

    0 讨论(0)
提交回复
热议问题