Sum of intervals lengths from an integer vector

怎甘沉沦 提交于 2019-12-10 13:46:14

问题


Let's say I have this integer vector:

> int.vec
 [1]  1  2  3  5  6  7 10 11 12 13

(created from int.vec <- c(1:3,5:7,10:13))

I'm looking for a function that will return the sum of the lengths of all intervals in this vector.

So basically for int.vec this function will return:

3+3+4 = 10

回答1:


We can create a grouping variable by taking the difference of adjacent elements, check whether that is not equal to 1, get the cumsum, use tapply to get the length, and sum the output.

sum(tapply(int.vec,cumsum(c(TRUE,diff(int.vec) !=1)), FUN=length))
#[1] 10

Or use table and sum

sum(table(int.vec,cumsum(c(TRUE,diff(int.vec) !=1))))
#[1] 10

Or we split the "int.vec" with the grouping variable derived from cumsum (split is very fast) and get the length of each list element with lengths (another fast option) - contributed by @Frank

sum(lengths(split(int.vec, cumsum(c(0,diff(int.vec)>1)))))

NOTE: No packages used. This will be helpful for identifying the individual length of each component (in case we needed that) by just removing the sum wrapper.


Based on further insights from @Symbolix's solution, the OP's expected output is just the length of the vector.

NROW(int.vec)
#[1] 10

can be used as well. This will also work in case we are working with data.frame. But, as I mentioned above, it seems that the OP need to identify both the length of each interval as well as the length. This solution provides both.




回答2:


The "cgwtools" package has a function called seqle that might be helpful here.

library(cgwtools)
int.vec <- c(1:3,5:7,10:13)
seqle(int.vec)
# Run Length Encoding
#   lengths: int [1:3] 3 3 4
#   values : int [1:3] 1 5 10

The result is a list, so you can just access and sum the "lengths" values with:

sum(seqle(int.vec)$lengths)
# [1] 10



回答3:


length(int.vec)
# 10

Your intervals are sequences of numbers, x1:xn, x1:xm, x1:xp, where the length of each vector (or interval in this case) is n, m, and p respectively.

The length of the whole vector is length(x1:xn) + length(x1:xm) + length(x1:xp), which is the same as length(n + m + p).


Now, if we really are interested in the length of each individual vector of sequences, we can do

int.vec <- c(1:3,5:7,10:13)

## use run-length-encoding (rle) to find sequences where the difference == 1
v <- rle(diff(int.vec) == 1)[[1]]
v[v!=1] + 1
# [1] 3 3 4

And, as pointed out by @AHandcartAndMohair, if you're working with a list you can use lengths

int.list <- list(c(1:3), c(5:7), c(10:13))
lengths(int.list)
# [1] 3 3 4


来源:https://stackoverflow.com/questions/36122916/sum-of-intervals-lengths-from-an-integer-vector

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!