Fastest way to find the index of the second (third...) highest/lowest value in vector or column ?
i.e. what
sort(x,partial=n-1)[n-1]
Method: Set all max values to -Inf, then find the indices of the max. No sorting required.
X <- runif(1e7)
system.time(
{
X[X == max(X)] <- -Inf
which(X == max(X))
})
Works with ties and is very fast.
If you can guarantee no ties, then an even faster version is
system.time(
{
X[which.max(X)] <- -Inf
which.max(X)
})
EDIT: As Joris mentioned, this method doesn't scale that well for finding third, fourth, etc., highest values.
which_nth_highest_richie <- function(x, n)
{
for(i in seq_len(n - 1L)) x[x == max(x)] <- -Inf
which(x == max(x))
}
which_nth_highest_joris <- function(x, n)
{
ux <- unique(x)
nux <- length(ux)
which(x == sort(ux, partial = nux - n + 1)[nux - n + 1])
}
Using x <- runif(1e7) and n = 2, Richie wins
system.time(which_nth_highest_richie(x, 2)) #about half a second
system.time(which_nth_highest_joris(x, 2)) #about 2 seconds
For n = 100, Joris wins
system.time(which_nth_highest_richie(x, 100)) #about 20 seconds, ouch!
system.time(which_nth_highest_joris(x, 100)) #still about 2 seconds
The balance point, where they take the same length of time, is about n = 10.