问题
Take the following generic data
A <- c(5,7,11,10,23,30,24,6)
B <- c(1,2,3,1,2,3,1,2)
C <- data.frame(A,B)
and the following intervals
library(intervals)
interval1 <- Intervals(
matrix(
c(
5, 15,
15, 25,
25, 35,
35, 100
),
ncol = 2, byrow = TRUE
),
closed = c( TRUE, FALSE ),
type = "Z"
)
rownames(interval1) <- c("A","B","C", "D")
interval2 <- Intervals(
matrix(
c(
0, 10,
12, 20,
22, 30,
30, 100
),
ncol = 2, byrow = TRUE
),
closed = c( TRUE, FALSE ),
type = "Z"
)
rownames(interval2) <- c("P","Q","R", "S")
Now I want to create the following output table

So where the A value overlap the two invervals, I want to 'copy' all the data to a line below.
We also introduce data$X
which is the interval1
value and data$y
which is the interval2
value.
Where data does not fit within any of the interval, I want to remove it from the data.frame
I am not sure if the break()
function would be better used to create the intervals or if the dplyr
function can be used to make the reoccuring data rows
回答1:
You can use foverlaps
in data.table
:
library(data.table)
C.DT <- data.table(C)
C.DT[, A1:=A] # required for `foverlaps` so we can do a range search
# `D` and `E` are your interval matrices
I1 <- data.table(cbind(data.frame(D), idX=LETTERS[1:4], idY=NA))
I2 <- data.table(cbind(data.frame(E), idX=NA, idY=LETTERS[16:19]))
setkey(I1, X1, X2) # set the keys on our interval ranges
setkey(I2, X1, X2)
rbind(
foverlaps(C.DT, I1, by.x=c("A", "A1"), nomatch=0), # match every value in `C.DT$A` to the ranges in `I1`
foverlaps(C.DT, I2, by.x=c("A", "A1"), nomatch=0)
)[order(A, B), .(A, B, X=idX, Y=idY)]
Produces:
A B X Y
1: 5 1 A NA
2: 5 1 NA P
3: 6 2 A NA
4: 6 2 NA P
5: 7 2 A NA
6: 7 2 NA P
7: 10 1 A NA
8: 10 1 NA P
9: 11 3 A NA
10: 23 2 B NA
11: 23 2 NA R
12: 24 1 B NA
13: 24 1 NA R
14: 30 3 C NA
15: 30 3 NA R
16: 30 3 NA S
Note you can easily change what you get instead of NA, by modifying the steps where I1
and I2
are created.
来源:https://stackoverflow.com/questions/30302483/using-intervals-to-assign-categorical-values