Simplest way to get the top n elements of a Scala Iterable

前端 未结 9 646
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-29 02:48

Is there a simple and efficient solution to determine the top n elements of a Scala Iterable? I mean something like

iter.toList.sortBy(_.myAttr).take(2)


        
相关标签:
9条回答
  • 2020-11-29 03:27

    Yet another version:

    val big = (1 to 100000)
    
    def maxes[A](n:Int)(l:Traversable[A])(implicit o:Ordering[A]) =
        l.foldLeft(collection.immutable.SortedSet.empty[A]) { (xs,y) =>
          if (xs.size < n) xs + y
          else {
            import o._
            val first = xs.firstKey
            if (first < y) xs - first + y
            else xs
          }
        }
    
    println(maxes(4)(big))
    println(maxes(2)(List("a","ab","c","z")))
    

    Using the Set force the list to have unique values:

    def maxes2[A](n:Int)(l:Traversable[A])(implicit o:Ordering[A]) =
        l.foldLeft(List.empty[A]) { (xs,y) =>
          import o._
          if (xs.size < n) (y::xs).sort(lt _)
          else {
            val first = xs.head
            if (first < y) (y::(xs - first)).sort(lt _)
            else xs
          }
        }
    
    0 讨论(0)
  • 2020-11-29 03:29

    I implemented such an ranking algorithm recently in the Rank class of Apache Jackrabbit (in Java though). See the take method for the gist of it. The basic idea is to quicksort but terminate prematurely as soon as the top n elements have been found.

    0 讨论(0)
  • 2020-11-29 03:39

    My solution (bound to Int, but should be easily changed to Ordered (a few minutes please):

    def top (n: Int, li: List [Int]) : List[Int] = {
    
      def updateSofar (sofar: List [Int], el: Int) : List [Int] = {
        // println (el + " - " + sofar)
        if (el < sofar.head) 
          (el :: sofar.tail).sortWith (_ > _) 
        else sofar
      }
    
      /* better readable:
        val sofar = li.take (n).sortWith (_ > _)
        val rest = li.drop (n)
        (sofar /: rest) (updateSofar (_, _)) */    
      (li.take (n). sortWith (_ > _) /: li.drop (n)) (updateSofar (_, _)) 
    }
    

    usage:

    val li = List (4, 3, 6, 7, 1, 2, 9, 5)    
    top (2, li)
    
    • For above list, take the first 2 (4, 3) as starting TopTen (TopTwo).
    • Sort them, such that the first element is the bigger one (if any).
    • repeatedly iterate through the rest of the list (li.drop(n)), and compare the current element with the maximum of the list of minimums; replace, if neccessary, and resort again.
    • Improvements:
      • Throw away Int, and use ordered.
      • Throw away (_ > _) and use a user-Ordering to allow BottomTen. (Harder: pick the middle 10 :) )
      • Throw away List, and use Iterable instead

    update (abstraction):

    def extremeN [T](n: Int, li: List [T])
      (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):
         List[T] = {
    
      def updateSofar (sofar: List [T], el: T) : List [T] =
        if (comp1 (el, sofar.head)) 
          (el :: sofar.tail).sortWith (comp2 (_, _)) 
        else sofar
    
      (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _)) 
    }
    
    /*  still bound to Int:  
    def top (n: Int, li: List [Int]) : List[Int] = {
      extremeN (n, li) ((_ < _), (_ > _))
    }
    def bottom (n: Int, li: List [Int]) : List[Int] = {
      extremeN (n, li) ((_ > _), (_ < _))
    }
    */
    
    def top [T] (n: Int, li: List [T]) 
      (implicit ord: Ordering[T]): Iterable[T] = {
      extremeN (n, li) (ord.lt (_, _), ord.gt (_, _))
    }
    def bottom [T] (n: Int, li: List [T])
      (implicit ord: Ordering[T]): Iterable[T] = {
      extremeN (n, li) (ord.gt (_, _), ord.lt (_, _))
    }
    
    top (3, li)
    bottom (3, li)
    val sl = List ("Haus", "Garten", "Boot", "Sumpf", "X", "y", "xkcd", "x11")
    bottom (2, sl)
    

    To replace List with Iterable seems to be a bit harder.

    As Daniel C. Sobral pointed out in the comments, a high n in topN can lead to much sorting work, so that it could be useful, to do a manual insertion sort instead of repeatedly sorting the whole list of top-n elements:

    def extremeN [T](n: Int, li: List [T])
      (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):
         List[T] = {
    
      def sortedIns (el: T, list: List[T]): List[T] = 
        if (list.isEmpty) List (el) else 
        if (comp2 (el, list.head)) el :: list else 
          list.head :: sortedIns (el, list.tail)
    
      def updateSofar (sofar: List [T], el: T) : List [T] =
        if (comp1 (el, sofar.head)) 
          sortedIns (el, sofar.tail)
        else sofar
    
      (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _)) 
    }
    

    top/bottom method and usage as above. For small groups of top/bottom Elements, the sorting is rarely called, a few times in the beginning, and then less and less often over time. For example, 70 times with top (10) of 10 000, and 90 times with top (10) of 100 000.

    0 讨论(0)
提交回复
热议问题