How to combine intervals data into fewer intervals in R?

限于喜欢 提交于 2019-12-10 18:33:51

问题


I am trying to collapse a series of intervals into fewer, equally meaningful intervals.

Consider for example this list of intervals

Intervals = list(
  c(23,34),
  c(45,48),
  c(31,35),
  c(7,16),
  c(5,9),
  c(56,57),
  c(55,58)
)

Because the intervals overlap, the same intervals can be described with few vectors. Plotting these intervals make obvious that a list of 4 vectors would be enough

plot(1,1,type="n",xlim=range(unlist(Intervals)),ylim=c(0.9,1.1))
segments(
    x0=sapply(Intervals,"[",1),
    x1=sapply(Intervals,"[",2),
    y0=rep(1,length(Intervals)),
    y1=rep(1,length(Intervals)),
    lwd=10
    )

How can I reduce my Intervals list to carry the same info than the one displayed on the plot? (performance matter)

The desired outputs for the above example is

Intervals = list(
  c(5,16)
  c(23,35),
  c(45,48),
  c(55,58)
)

回答1:


What you need is the reduce function in the IRanges package.

In.df <- do.call(rbind, Intervals)
library(IRanges)

In.ir <- IRanges(In.df[, 1], In.df[,2])

out.ir <- reduce(In.ir)
out.ir
# IRanges of length 4
#     start end width
# [1]     5  16    12
# [2]    23  35    13
# [3]    45  48     4
# [4]    55  58     4



回答2:


One option with base R:

First I put your list in a data.frame:

ints <- as.data.frame(do.call(rbind, Intervals))
names(ints) <- c('start', 'stop')

so it looks like

  start stop
1    23   34
2    45   48
3    31   35
4     7   16
5     5    9
6    56   57
7    55   58

Now, two for loops compare with between, and expand an interval when a crossover is found:

for(x in 1:nrow(ints)){
  for(y in 1:nrow(ints)){
    if(between(ints$start[x], ints$start[y], ints$stop[y])){
      ints$start[x] <- ints$start[y]
      if(ints$stop[y] > ints$stop[x]){
        ints$stop[x] <- ints$stop[y]
      } else {
        ints$stop[y] <- ints$stop[x]
      }
    }
  }
}

which alters ints to

> ints
  start stop
1    23   35
2    45   48
3    23   35
4     5   16
5     5   16
6    55   58
7    55   58

Simplify to unique cases:

ints <- unique(ints, margin = 1)

and put them in order

ints <- ints[order(ints$start),]

which leaves you with

> ints
  start stop
4     5   16
1    23   35
2    45   48
6    55   58

If you want it back in a list like the original,

Intervals <- lapply(1:nrow(ints), function(x)c(ints[x,1], ints[x,2]))

(Note: You can certainly do this with *apply instead of for, Booleans instead of between, and the original list instead of a data.frame, but, well, this is readable. Rewrite/optimize as you like.)



来源:https://stackoverflow.com/questions/35006269/how-to-combine-intervals-data-into-fewer-intervals-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!