Simplest way to get the top n elements of a Scala Iterable

前端 未结 9 652
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-29 02:48

Is there a simple and efficient solution to determine the top n elements of a Scala Iterable? I mean something like

iter.toList.sortBy(_.myAttr).take(2)
         


        
9条回答
  •  粉色の甜心
    2020-11-29 03:21

    Here is asymptotically O(n) solution.

    def top[T](data: List[T], n: Int)(implicit ord: Ordering[T]): List[T] = {
        require( n < data.size)
    
        def partition_inner(shuffledData: List[T], pivot: T): List[T] = 
          shuffledData.partition( e => ord.compare(e, pivot) > 0 ) match {
              case (left, right) if left.size == n => left
              case (left, x :: rest) if left.size < n => 
                partition_inner(util.Random.shuffle(data), x)
              case (left @ y :: rest, right) if left.size > n => 
                partition_inner(util.Random.shuffle(data), y)
          }
    
         val shuffled = util.Random.shuffle(data)
         partition_inner(shuffled, shuffled.head)
    }
    
    scala> top(List.range(1,10000000), 5)
    

    Due to recursion, this solution will take longer than some non-linear solutions above and can cause java.lang.OutOfMemoryError: GC overhead limit exceeded. But slightly more readable IMHO and functional style. Just for job interview ;).

    What is more important, that this solution can be easily parallelized.

    def top[T](data: List[T], n: Int)(implicit ord: Ordering[T]): List[T] = {
        require( n < data.size)
    
        @tailrec
        def partition_inner(shuffledData: List[T], pivot: T): List[T] = 
          shuffledData.par.partition( e => ord.compare(e, pivot) > 0 ) match {
              case (left, right) if left.size == n => left.toList
              case (left, right) if left.size < n => 
                partition_inner(util.Random.shuffle(data), right.head)
              case (left, right) if left.size > n => 
                partition_inner(util.Random.shuffle(data), left.head)
          }
    
         val shuffled = util.Random.shuffle(data)
         partition_inner(shuffled, shuffled.head)
    }
    

提交回复
热议问题