问题
Let's say I have this integer vector:
> int.vec
[1] 1 2 3 5 6 7 10 11 12 13
(created from int.vec <- c(1:3,5:7,10:13))
I'm looking for a function that will return the sum of the lengths of all intervals in this vector.
So basically for int.vec this function will return:
3+3+4 = 10
回答1:
We can create a grouping variable by taking the difference of adjacent elements, check whether that is not equal to 1, get the cumsum, use tapply to get the length, and sum the output.
sum(tapply(int.vec,cumsum(c(TRUE,diff(int.vec) !=1)), FUN=length))
#[1] 10
Or use table and sum
sum(table(int.vec,cumsum(c(TRUE,diff(int.vec) !=1))))
#[1] 10
Or we split the "int.vec" with the grouping variable derived from cumsum (split is very fast) and get the length of each list element with lengths (another fast option) - contributed by @Frank
sum(lengths(split(int.vec, cumsum(c(0,diff(int.vec)>1)))))
NOTE: No packages used. This will be helpful for identifying the individual length of each component (in case we needed that) by just removing the sum wrapper.
Based on further insights from @Symbolix's solution, the OP's expected output is just the length of the vector.
NROW(int.vec)
#[1] 10
can be used as well. This will also work in case we are working with data.frame. But, as I mentioned above, it seems that the OP need to identify both the length of each interval as well as the length. This solution provides both.
回答2:
The "cgwtools" package has a function called seqle that might be helpful here.
library(cgwtools)
int.vec <- c(1:3,5:7,10:13)
seqle(int.vec)
# Run Length Encoding
# lengths: int [1:3] 3 3 4
# values : int [1:3] 1 5 10
The result is a list, so you can just access and sum the "lengths" values with:
sum(seqle(int.vec)$lengths)
# [1] 10
回答3:
length(int.vec)
# 10
Your intervals are sequences of numbers, x1:xn, x1:xm, x1:xp, where the length of each vector (or interval in this case) is n, m, and p respectively.
The length of the whole vector is length(x1:xn) + length(x1:xm) + length(x1:xp),
which is the same as length(n + m + p).
Now, if we really are interested in the length of each individual vector of sequences, we can do
int.vec <- c(1:3,5:7,10:13)
## use run-length-encoding (rle) to find sequences where the difference == 1
v <- rle(diff(int.vec) == 1)[[1]]
v[v!=1] + 1
# [1] 3 3 4
And, as pointed out by @AHandcartAndMohair, if you're working with a list you can use lengths
int.list <- list(c(1:3), c(5:7), c(10:13))
lengths(int.list)
# [1] 3 3 4
来源:https://stackoverflow.com/questions/36122916/sum-of-intervals-lengths-from-an-integer-vector