问题
Let's say I have this integer vector
:
> int.vec
[1] 1 2 3 5 6 7 10 11 12 13
(created from int.vec <- c(1:3,5:7,10:13)
)
I'm looking for a function that will return the sum of the lengths of all intervals in this vector.
So basically for int.vec
this function will return:
3+3+4 = 10
回答1:
We can create a grouping variable by taking the difference of adjacent elements, check whether that is not equal to 1, get the cumsum
, use tapply
to get the length
, and sum
the output.
sum(tapply(int.vec,cumsum(c(TRUE,diff(int.vec) !=1)), FUN=length))
#[1] 10
Or use table
and sum
sum(table(int.vec,cumsum(c(TRUE,diff(int.vec) !=1))))
#[1] 10
Or we split
the "int.vec" with the grouping variable derived from cumsum
(split
is very fast) and get the length
of each list
element with lengths
(another fast option) - contributed by @Frank
sum(lengths(split(int.vec, cumsum(c(0,diff(int.vec)>1)))))
NOTE: No packages used. This will be helpful for identifying the individual length
of each component (in case we needed that) by just removing the sum
wrapper.
Based on further insights from @Symbolix's solution, the OP's expected output is just the length
of the vector
.
NROW(int.vec)
#[1] 10
can be used as well. This will also work in case we are working with data.frame
. But, as I mentioned above, it seems that the OP need to identify both the length
of each interval as well as the length
. This solution provides both.
回答2:
The "cgwtools" package has a function called seqle
that might be helpful here.
library(cgwtools)
int.vec <- c(1:3,5:7,10:13)
seqle(int.vec)
# Run Length Encoding
# lengths: int [1:3] 3 3 4
# values : int [1:3] 1 5 10
The result is a list
, so you can just access and sum the "lengths" values with:
sum(seqle(int.vec)$lengths)
# [1] 10
回答3:
length(int.vec)
# 10
Your intervals are sequences of numbers, x1:xn
, x1:xm
, x1:xp
, where the length of each vector (or interval in this case) is n
, m
, and p
respectively.
The length of the whole vector is length(x1:xn)
+ length(x1:xm)
+ length(x1:xp)
,
which is the same as length(n + m + p)
.
Now, if we really are interested in the length of each individual vector of sequences, we can do
int.vec <- c(1:3,5:7,10:13)
## use run-length-encoding (rle) to find sequences where the difference == 1
v <- rle(diff(int.vec) == 1)[[1]]
v[v!=1] + 1
# [1] 3 3 4
And, as pointed out by @AHandcartAndMohair, if you're working with a list you can use lengths
int.list <- list(c(1:3), c(5:7), c(10:13))
lengths(int.list)
# [1] 3 3 4
来源:https://stackoverflow.com/questions/36122916/sum-of-intervals-lengths-from-an-integer-vector