问题
I would like to return values with matching conditions in another column based on a cut score criterion. If the cut scores are not available in the variable, I would like to grab closest larger value. Here is a snapshot of dataset:
ids <- c(1,2,3,4,5,6,7,8,9,10)
scores.a <- c(512,531,541,555,562,565,570,572,573,588)
scores.b <- c(12,13,14,15,16,17,18,19,20,21)
data <- data.frame(ids, scores.a, scores.b)
> data
ids scores.a scores.b
1 1 512 12
2 2 531 13
3 3 541 14
4 4 555 15
5 5 562 16
6 6 565 17
7 7 570 18
8 8 572 19
9 9 573 20
10 10 588 21
cuts <- c(531, 560, 571)
I would like to grab score.b
value corresponding to the first cut score, which is 13
. Then, grab score.b value corresponding to the second cut (560
) score but it is not in the score.a, so I would like to get the score.a value 562
(closest to 560
), and the corresponding value would be 16
. Lastly, for the third cut score (571
), I would like to get 19 which is the corresponding value of the closest value (572
) to the third cut score.
Here is what I would like to get.
scores.b
cut.1 13
cut.2 16
cut.3 19
Any thoughts? Thanks
回答1:
We can use a rolling join
library(data.table)
setDT(data)[data.table(cuts = cuts), .(ids = ids, cuts, scores.b),
on = .(scores.a = cuts), roll = -Inf]
# ids cuts scores.b
#1: 2 531 13
#2: 5 560 16
#3: 8 571 19
Or another option is findInterval
from base R
after changing the sign and taking the rev
erse
with(data, scores.b[rev(nrow(data) + 1 - findInterval(rev(-cuts), rev(-scores.a)))])
#[1] 13 16 19
回答2:
This doesn't remove the other columns, but this illustrates correct results better
df1 <- data[match(seq_along(cuts), findInterval(data$scores.a, cuts)), ]
rownames(df1) <- paste("cuts", seq_along(cuts), sep = ".")
> df1
ids scores.a scores.b
cuts.1 2 531 13
cuts.2 5 562 16
cuts.3 8 572 19
来源:https://stackoverflow.com/questions/59570555/return-values-with-matching-conditions-in-r