Return values with matching conditions in r

问题

I would like to return values with matching conditions in another column based on a cut score criterion. If the cut scores are not available in the variable, I would like to grab closest larger value. Here is a snapshot of dataset:

ids <- c(1,2,3,4,5,6,7,8,9,10)
scores.a <- c(512,531,541,555,562,565,570,572,573,588)
scores.b <- c(12,13,14,15,16,17,18,19,20,21)
data <- data.frame(ids, scores.a, scores.b)
> data
   ids scores.a scores.b
1    1      512       12
2    2      531       13
3    3      541       14
4    4      555       15
5    5      562       16
6    6      565       17
7    7      570       18
8    8      572       19
9    9      573       20
10  10      588       21

cuts <- c(531, 560, 571)

I would like to grab score.b value corresponding to the first cut score, which is 13. Then, grab score.b value corresponding to the second cut (560) score but it is not in the score.a, so I would like to get the score.a value 562 (closest to 560), and the corresponding value would be 16. Lastly, for the third cut score (571), I would like to get 19 which is the corresponding value of the closest value (572) to the third cut score.

Here is what I would like to get.

       scores.b
cut.1  13
cut.2  16
cut.3  19

Any thoughts? Thanks

回答1:

We can use a rolling join

library(data.table)
setDT(data)[data.table(cuts = cuts), .(ids = ids, cuts, scores.b), 
          on = .(scores.a = cuts), roll = -Inf]
#   ids cuts scores.b
#1:   2  531       13
#2:   5  560       16
#3:   8  571       19

Or another option is findInterval from base R after changing the sign and taking the reverse

with(data, scores.b[rev(nrow(data) + 1 - findInterval(rev(-cuts), rev(-scores.a)))])
#[1] 13 16 19

回答2:

This doesn't remove the other columns, but this illustrates correct results better

df1 <- data[match(seq_along(cuts), findInterval(data$scores.a, cuts)), ]
rownames(df1) <- paste("cuts", seq_along(cuts), sep = ".")

> df1
       ids scores.a scores.b
cuts.1   2      531       13
cuts.2   5      562       16
cuts.3   8      572       19

来源：https://stackoverflow.com/questions/59570555/return-values-with-matching-conditions-in-r

标签

subset