findInterval() with right-closed intervals

怎甘沉沦 提交于 2019-11-27 11:16:33

问题


The great findInterval() function in R uses left-closed sub-intervals in its vec argument, as shown in its docs:

if i <- findInterval(x,v), we have v[i[j]] <= x[j] < v[i[j] + 1]

If I want right-closed sub-intervals, what are my options? The best I've come up with is this:

findInterval.rightClosed <- function(x, vec, ...) {
  fi <- findInterval(x, vec, ...)
  fi - (x==vec[fi])
}

Another one also works:

findInterval.rightClosed2 <- function(x, vec, ...) {
  length(vec) - findInterval(-x, -rev(vec), ...)
}

Here's a little test:

x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
findInterval(x, vec)
# [1] 1 3 3 3 3 4 4
findInterval.rightClosed(x, vec)
# [1] 1 2 3 3 3 4 4
findInterval.rightClosed2(x, vec)
# [1] 1 2 3 3 3 4 4

But I'd like to see any other solutions if there's a better one. By "better", I mean "somehow more satisfying" or "doesn't feel like a kludge" or maybe even "more efficient". =)

(Note that there's a rightmost.closed argument to findInterval(), but it's different - it only refers to the final sub-interval and has a different meaning.)


回答1:


EDIT: Major clean-up in all aisles.

You might look at cut. By default, cut makes left open and right closed intervals, and that can be changed using the appropriate argument (right). To use your example:

x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
cutVec <- c(vec, max(x)) # for cut, range of vec should cover all of x

Now create four functions that should do the same thing: Two from the OP, one from Josh O'Brien, and then cut. Two arguments to cut have been changed from default settings: include.lowest = TRUE will create an interval closed on both sides for the smallest (leftmost) interval. labels = FALSE will cause cut to return simply the integer values for the bins instead of creating a factor, which it otherwise does.

findInterval.rightClosed <- function(x, vec, ...) {
  fi <- findInterval(x, vec, ...)
  fi - (x==vec[fi])
}
findInterval.rightClosed2 <- function(x, vec, ...) {
  length(vec) - findInterval(-x, -rev(vec), ...)
}
cutFun <- function(x, vec){
    cut(x, vec, include.lowest = TRUE, labels = FALSE)
}
# The body of fiFun is a contribution by Josh O'Brien that got fed to the ether.
fiFun <- function(x, vec){
    xxFI <- findInterval(x, vec * (1 + .Machine$double.eps))
}

Do all functions return the same result? Yup. (notice the use of cutVec for cutFun)

mapply(identical, list(findInterval.rightClosed(x, vec)),
  list(findInterval.rightClosed2(x, vec), cutFun(x, cutVec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE

Now a more demanding vector to bin:

x <- rpois(2e6, 10)
vec <- c(-Inf, quantile(x, seq(.2, 1, .2)))

Test whether identical (note use of unname)

mapply(identical, list(unname(findInterval.rightClosed(x, vec))),
  list(findInterval.rightClosed2(x, vec), cutFun(x, vec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE

And benchmark:

library(microbenchmark)
microbenchmark(findInterval.rightClosed(x, vec), findInterval.rightClosed2(x, vec),
  cutFun(x, vec), fiFun(x, vec), times = 50)
# Unit: milliseconds
#                                expr       min        lq    median        uq       max
# 1                    cutFun(x, vec)  35.46261  35.63435  35.81233  36.68036  53.52078
# 2                     fiFun(x, vec)  51.30158  51.69391  52.24277  53.69253  67.09433
# 3  findInterval.rightClosed(x, vec) 124.57110 133.99315 142.06567 155.68592 176.43291
# 4 findInterval.rightClosed2(x, vec)  79.81685  82.01025  86.20182  95.65368 108.51624

From this run, cut seems to be the fastest.




回答2:


Maybe you can use the option left.open:

findInterval(x, vec, left.open=T)
[1] 1 2 3 3 3 4 4



回答3:


If your limits are intervals you simply can grow the right interval a bit: interval+c(0,0.1) would do: findinterval(value, interval+c(0,0.1))



来源:https://stackoverflow.com/questions/13482872/findinterval-with-right-closed-intervals

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!