问题
I'll start with an example, and then describe the logic I'm trying to use.
I have two normal IRanges objects that span the same total range, but may do so in a different number of ranges. Each IRanges has one mcol, but that mcol is different across IRanges.
a
#IRanges object with 1 range and 1 metadata column:
# start end width | on_betalac
# <integer> <integer> <integer> | <logical>
# [1] 1 167 167 | FALSE
b
#IRanges object with 3 ranges and 1 metadata column:
# start end width | on_other
# <integer> <integer> <integer> | <logical>
# [1] 1 107 107 | FALSE
# [2] 108 112 5 | TRUE
# [3] 113 167 55 | FALSE
You can see both of these IRanges span 1 to 167, but a has one range and b has three. I would like to combine them to get output like this:
my_great_function(a, b)
#IRanges object with 3 ranges and 2 metadata columns:
# start end width | on_betalac on_other
# <integer> <integer> <integer> | <logical> <logical>
# [1] 1 107 107 | FALSE FALSE
# [2] 108 112 5 | FALSE TRUE
# [3] 113 167 55 | FALSE FALSE
The output is a like a disjoin of the inputs, but it keeps the mcols, and even spreads them so that the output range has the same value of the mcol as the input range that led to it.
回答1:
Option 1: Using IRanges::findOverlaps
m <- findOverlaps(b, a)
c <- b[queryHits(m)]
mcols(c) <- cbind(mcols(c), mcols(a[subjectHits(m)]))
#IRanges object with 3 ranges and 2 metadata columns:
# start end width | on_other on_betacalc
# <integer> <integer> <integer> | <logical> <logical>
# [1] 1 107 107 | FALSE FALSE
# [2] 108 112 5 | TRUE FALSE
# [3] 113 167 55 | FALSE FALSE
The resulting object c is a IRanges object with two metadata columns.
Option 2: Using IRanges::mergeByOverlaps
c <- mergeByOverlaps(b, a)
c
#DataFrame with 3 rows and 4 columns
# b on_other a on_betacalc
# <IRanges> <logical> <IRanges> <logical>
#1 1-107 FALSE 1-167 FALSE
#2 108-112 TRUE 1-167 FALSE
#3 113-167 FALSE 1-167 FALSE
The resulting output object is a DataFrame with IRanges columns and original metadata columns as additional columns.
Option 3: Using data.table::foverlaps
library(data.table)
a.dt <- as.data.table(cbind.data.frame(a, mcols(a)))[, width := NULL]
b.dt <- as.data.table(cbind.data.frame(b, mcols(b)))[, width := NULL]
setkey(b.dt, start, end)
foverlaps(a.dt, b.dt, type = "any")[, `:=`(i.start = NULL, i.end = NULL)][]
start end on_other on_betacalc
1: 1 107 FALSE FALSE
2: 108 112 TRUE FALSE
3: 113 167 FALSE FALSE
The resulting object is a data.table.
Option 4: Using fuzzyjoin::interval_left_join
library(fuzzyjoin)
a.df <- cbind.data.frame(a, mcols(a))
b.df <- cbind.data.frame(b, mcols(b))
interval_left_join(b.df, a.df, by = c("start", "end"))
# start.x end.x width.x on_other start.y end.y width.y on_betacalc
#1 1 107 107 FALSE 1 167 167 FALSE
#2 108 112 5 TRUE 1 167 167 FALSE
#3 113 167 55 FALSE 1 167 167 FALSE
The resulting object is a data.frame.
Sample data
library(IRanges)
a <- IRanges(1, 167)
mcols(a)$on_betacalc = F
b <- IRanges(c(1, 108, 113), c(107, 112, 167))
mcols(b)$on_other <- c(F, T, F)
回答2:
Here's what I've been able to come up with. Not as elegant as MauritsEvers, but maybe useful to others in some way.
combine_exposures <- function(...) {
cd <- c(...)
mc <- mcols(cd)
dj <- disjoin(x = cd, with.revmap = TRUE)
r <- mcols(dj)$revmap
d <- as.data.frame(matrix(nrow = length(dj), ncol = ncol(mc)))
names(d) <- names(mc)
for (i in 1:length(dj)) {
d[i,] <- sapply(X = 1:ncol(mc), FUN = function(j) { mc[r[[i]][j], j] })
}
mcols(dj) <- d
return(dj)
}
here is dput(c(e1, e2, e3, e4)) (e1, e2, e3, and e4 are some example IRanges that all span 1,167):
new("IRanges", start = c(1L, 1L, 108L, 113L, 1L, 1L), width = c(167L,
107L, 5L, 55L, 167L, 167L), NAMES = NULL, elementType = "ANY",
elementMetadata = new("DataFrame", rownames = NULL, nrows = 6L,
listData = list(on_betalac = c(FALSE, NA, NA, NA, NA,
NA), on_other = c(NA, FALSE, TRUE, FALSE, NA, NA), on_pen = c(NA,
NA, NA, NA, FALSE, NA), on_quin = c(NA, NA, NA, NA, NA,
FALSE)), elementType = "ANY", elementMetadata = NULL,
metadata = list()), metadata = list())
来源:https://stackoverflow.com/questions/55582833/combining-iranges-objects-and-maintaining-mcols