Problem: Given an atomic vector, find the start and end indices of runs in the vector.
Example vector with runs:
x = rev(rep(6:10, 1:5))
# [1] 10 10
A data.table possibility, where .I and .N are used to pick relevant indices, per group defined by rleid runs.
library(data.table)
data.table(x)[ , .(start = .I[1], end = .I[.N]), by = rleid(x)][, rleid := NULL][]
# start end
# 1: 1 5
# 2: 6 9
# 3: 10 12
# 4: 13 14
# 5: 15 15
Core logic:
# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)
# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)
# Display results
data.frame(start, end)
# start end
# 1 1 5
# 2 6 9
# 3 10 12
# 4 13 14
# 5 15 15
Tidyverse/dplyr way (data frame-centric):
library(dplyr)
rle(x) %>%
unclass() %>%
as.data.frame() %>%
mutate(end = cumsum(lengths),
start = c(1, dplyr::lag(end)[-1] + 1)) %>%
magrittr::extract(c(1,2,4,3)) # To re-order start before end for display
Because the start and end vectors are the same length as the values component of the rle object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter or subset the start and end vectors using the condition on the run values.