Find start and end positions/indices of runs/consecutive values

后端 未结 2 1660
生来不讨喜
生来不讨喜 2020-11-30 11:47

Problem: Given an atomic vector, find the start and end indices of runs in the vector.

Example vector with runs:

x = rev(rep(6:10, 1:5))
# [1] 10 10          


        
相关标签:
2条回答
  • 2020-11-30 12:11

    A data.table possibility, where .I and .N are used to pick relevant indices, per group defined by rleid runs.

    library(data.table)
    data.table(x)[ , .(start = .I[1], end = .I[.N]), by = rleid(x)][, rleid := NULL][]
    #    start end
    # 1:     1   5
    # 2:     6   9
    # 3:    10  12
    # 4:    13  14
    # 5:    15  15
    
    0 讨论(0)
  • 2020-11-30 12:21

    Core logic:

    # Example vector and rle object
    x = rev(rep(6:10, 1:5))
    rle_x = rle(x)
    
    # Compute endpoints of run
    end = cumsum(rle_x$lengths)
    start = c(1, lag(end)[-1] + 1)
    
    # Display results
    data.frame(start, end)
    #   start end
    # 1     1   5
    # 2     6   9
    # 3    10  12
    # 4    13  14
    # 5    15  15
    

    Tidyverse/dplyr way (data frame-centric):

    library(dplyr)
    
    rle(x) %>%
      unclass() %>%
      as.data.frame() %>%
      mutate(end = cumsum(lengths),
             start = c(1, dplyr::lag(end)[-1] + 1)) %>%
      magrittr::extract(c(1,2,4,3)) # To re-order start before end for display
    

    Because the start and end vectors are the same length as the values component of the rle object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter or subset the start and end vectors using the condition on the run values.

    0 讨论(0)
提交回复
热议问题