Fastest way to find *the index* of the second (third…) highest/lowest value in vector or column

后端 未结 7 921
青春惊慌失措
青春惊慌失措 2020-12-17 10:04

Fastest way to find the index of the second (third...) highest/lowest value in vector or column ?

i.e. what

sort(x,partial=n-1)[n-1]
         


        
7条回答
  •  心在旅途
    2020-12-17 10:41

    Method: Set all max values to -Inf, then find the indices of the max. No sorting required.

    X <- runif(1e7)
    system.time(
    {
      X[X == max(X)] <- -Inf
      which(X == max(X))
    })
    

    Works with ties and is very fast.

    If you can guarantee no ties, then an even faster version is

    system.time(
    {
      X[which.max(X)] <- -Inf
      which.max(X)
    })
    

    EDIT: As Joris mentioned, this method doesn't scale that well for finding third, fourth, etc., highest values.

    which_nth_highest_richie <- function(x, n)
    {
      for(i in seq_len(n - 1L)) x[x == max(x)] <- -Inf
      which(x == max(x))
    }
    
    which_nth_highest_joris <- function(x, n)
    {
      ux <- unique(x)
      nux <- length(ux)
      which(x == sort(ux, partial = nux - n + 1)[nux - n + 1])
    }
    

    Using x <- runif(1e7) and n = 2, Richie wins

    system.time(which_nth_highest_richie(x, 2))   #about half a second
    system.time(which_nth_highest_joris(x, 2))    #about 2 seconds
    

    For n = 100, Joris wins

    system.time(which_nth_highest_richie(x, 100)) #about 20 seconds, ouch! 
    system.time(which_nth_highest_joris(x, 100))  #still about 2 seconds
    

    The balance point, where they take the same length of time, is about n = 10.

提交回复
热议问题