using intervals to assign categorical values

Take the following generic data

A <- c(5,7,11,10,23,30,24,6)
B <- c(1,2,3,1,2,3,1,2)
C <- data.frame(A,B)

and the following intervals

library(intervals)
interval1 <- Intervals(
  matrix(
    c(
      5, 15,
      15, 25,
      25, 35,
      35, 100
    ),
    ncol = 2, byrow = TRUE
  ),
  closed = c( TRUE, FALSE ),
  type = "Z"
)
rownames(interval1) <- c("A","B","C", "D")

interval2 <- Intervals(
  matrix(
    c(
      0, 10,
      12, 20,
      22, 30,
      30, 100
    ),
    ncol = 2, byrow = TRUE
  ),
  closed = c( TRUE, FALSE ),
  type = "Z"
)
rownames(interval2) <- c("P","Q","R", "S")

Now I want to create the following output table

So where the A value overlap the two invervals, I want to 'copy' all the data to a line below. We also introduce data$X which is the interval1 value and data$y which is the interval2 value. Where data does not fit within any of the interval, I want to remove it from the data.frame

I am not sure if the break() function would be better used to create the intervals or if the dplyr function can be used to make the reoccuring data rows

You can use foverlaps in data.table:

library(data.table)
C.DT <- data.table(C)
C.DT[, A1:=A] # required for `foverlaps` so we can do a range search

# `D` and `E` are your interval matrices

I1 <- data.table(cbind(data.frame(D), idX=LETTERS[1:4], idY=NA))
I2 <- data.table(cbind(data.frame(E), idX=NA, idY=LETTERS[16:19]))

setkey(I1, X1, X2)  # set the keys on our interval ranges
setkey(I2, X1, X2)

rbind(
  foverlaps(C.DT, I1, by.x=c("A", "A1"), nomatch=0), # match every value in `C.DT$A` to the ranges in `I1` 
  foverlaps(C.DT, I2, by.x=c("A", "A1"), nomatch=0)
)[order(A, B), .(A, B, X=idX, Y=idY)]

Produces:

     A B  X  Y
 1:  5 1  A NA
 2:  5 1 NA  P
 3:  6 2  A NA
 4:  6 2 NA  P
 5:  7 2  A NA
 6:  7 2 NA  P
 7: 10 1  A NA
 8: 10 1 NA  P
 9: 11 3  A NA
10: 23 2  B NA
11: 23 2 NA  R
12: 24 1  B NA
13: 24 1 NA  R
14: 30 3  C NA
15: 30 3 NA  R
16: 30 3 NA  S

Note you can easily change what you get instead of NA, by modifying the steps where I1 and I2 are created.

来源：https://stackoverflow.com/questions/30302483/using-intervals-to-assign-categorical-values

标签

dataframe

dplyr

intervals