Take the following generic data
A <- c(5,7,11,10,23,30,24,6)
B <- c(1,2,3,1,2,3,1,2)
C <- data.frame(A,B)
and the following intervals
library(intervals)
interval1 <- Intervals(
matrix(
c(
5, 15,
15, 25,
25, 35,
35, 100
),
ncol = 2, byrow = TRUE
),
closed = c( TRUE, FALSE ),
type = "Z"
)
rownames(interval1) <- c("A","B","C", "D")
interval2 <- Intervals(
matrix(
c(
0, 10,
12, 20,
22, 30,
30, 100
),
ncol = 2, byrow = TRUE
),
closed = c( TRUE, FALSE ),
type = "Z"
)
rownames(interval2) <- c("P","Q","R", "S")
Now I want to create the following output table

So where the A value overlap the two invervals, I want to 'copy' all the data to a line below.
We also introduce data$X
which is the interval1
value and data$y
which is the interval2
value.
Where data does not fit within any of the interval, I want to remove it from the data.frame
I am not sure if the break()
function would be better used to create the intervals or if the dplyr
function can be used to make the reoccuring data rows
You can use foverlaps
in data.table
:
library(data.table)
C.DT <- data.table(C)
C.DT[, A1:=A] # required for `foverlaps` so we can do a range search
# `D` and `E` are your interval matrices
I1 <- data.table(cbind(data.frame(D), idX=LETTERS[1:4], idY=NA))
I2 <- data.table(cbind(data.frame(E), idX=NA, idY=LETTERS[16:19]))
setkey(I1, X1, X2) # set the keys on our interval ranges
setkey(I2, X1, X2)
rbind(
foverlaps(C.DT, I1, by.x=c("A", "A1"), nomatch=0), # match every value in `C.DT$A` to the ranges in `I1`
foverlaps(C.DT, I2, by.x=c("A", "A1"), nomatch=0)
)[order(A, B), .(A, B, X=idX, Y=idY)]
Produces:
A B X Y
1: 5 1 A NA
2: 5 1 NA P
3: 6 2 A NA
4: 6 2 NA P
5: 7 2 A NA
6: 7 2 NA P
7: 10 1 A NA
8: 10 1 NA P
9: 11 3 A NA
10: 23 2 B NA
11: 23 2 NA R
12: 24 1 B NA
13: 24 1 NA R
14: 30 3 C NA
15: 30 3 NA R
16: 30 3 NA S
Note you can easily change what you get instead of NA, by modifying the steps where I1
and I2
are created.
来源:https://stackoverflow.com/questions/30302483/using-intervals-to-assign-categorical-values