问题
I am trying to collapse a series of intervals into fewer, equally meaningful intervals.
Consider for example this list of intervals
Intervals = list(
c(23,34),
c(45,48),
c(31,35),
c(7,16),
c(5,9),
c(56,57),
c(55,58)
)
Because the intervals overlap, the same intervals can be described with few vectors. Plotting these intervals make obvious that a list of 4 vectors would be enough
plot(1,1,type="n",xlim=range(unlist(Intervals)),ylim=c(0.9,1.1))
segments(
x0=sapply(Intervals,"[",1),
x1=sapply(Intervals,"[",2),
y0=rep(1,length(Intervals)),
y1=rep(1,length(Intervals)),
lwd=10
)
How can I reduce my Intervals
list to carry the same info than the one displayed on the plot? (performance matter)
The desired outputs for the above example is
Intervals = list(
c(5,16)
c(23,35),
c(45,48),
c(55,58)
)
回答1:
What you need is the reduce
function in the IRanges
package.
In.df <- do.call(rbind, Intervals)
library(IRanges)
In.ir <- IRanges(In.df[, 1], In.df[,2])
out.ir <- reduce(In.ir)
out.ir
# IRanges of length 4
# start end width
# [1] 5 16 12
# [2] 23 35 13
# [3] 45 48 4
# [4] 55 58 4
回答2:
One option with base R:
First I put your list in a data.frame
:
ints <- as.data.frame(do.call(rbind, Intervals))
names(ints) <- c('start', 'stop')
so it looks like
start stop
1 23 34
2 45 48
3 31 35
4 7 16
5 5 9
6 56 57
7 55 58
Now, two for
loops compare with between
, and expand an interval when a crossover is found:
for(x in 1:nrow(ints)){
for(y in 1:nrow(ints)){
if(between(ints$start[x], ints$start[y], ints$stop[y])){
ints$start[x] <- ints$start[y]
if(ints$stop[y] > ints$stop[x]){
ints$stop[x] <- ints$stop[y]
} else {
ints$stop[y] <- ints$stop[x]
}
}
}
}
which alters ints
to
> ints
start stop
1 23 35
2 45 48
3 23 35
4 5 16
5 5 16
6 55 58
7 55 58
Simplify to unique
cases:
ints <- unique(ints, margin = 1)
and put them in order
ints <- ints[order(ints$start),]
which leaves you with
> ints
start stop
4 5 16
1 23 35
2 45 48
6 55 58
If you want it back in a list like the original,
Intervals <- lapply(1:nrow(ints), function(x)c(ints[x,1], ints[x,2]))
(Note: You can certainly do this with *apply
instead of for
, Booleans instead of between
, and the original list instead of a data.frame
, but, well, this is readable. Rewrite/optimize as you like.)
来源:https://stackoverflow.com/questions/35006269/how-to-combine-intervals-data-into-fewer-intervals-in-r