Is Scala idiomatic coding style just a cool trap for writing inefficient code?

后端 未结 10 729
借酒劲吻你
借酒劲吻你 2020-12-22 17:19

I sense that the Scala community has a little big obsession with writing \"concise\", \"cool\", \"scala idiomatic\", \"one-liner\" -if possible- code. This

相关标签:
10条回答
  • 2020-12-22 17:27

    For reference, here's how splitAt is defined in TraversableLike in the Scala standard library,

    def splitAt(n: Int): (Repr, Repr) = {
      val l, r = newBuilder
      l.sizeHintBounded(n, this)
      if (n >= 0) r.sizeHint(this, -n)
      var i = 0
      for (x <- this) {
        (if (i < n) l else r) += x
        i += 1
      }
      (l.result, r.result)
    }
    

    It's not unlike your example code of what a Java programmer might come up with.

    I like Scala because, where performance matters, mutability is a reasonable way to go. The collections library is a great example; especially how it hides this mutability behind a functional interface.

    Where performance isn't as important, such as some application code, the higher order functions in Scala's library allow great expressivity and programmer efficiency.


    Out of curiosity, I picked an arbitrary large file in the Scala compiler (scala.tools.nsc.typechecker.Typers.scala) and counted something like 37 for loops, 11 while loops, 6 concatenations (++), and 1 fold (it happens to be a foldRight).

    0 讨论(0)
  • 2020-12-22 17:28
    def removeOneMax (xs: List [Int]) : List [Int] = xs match {                                  
        case x :: Nil => Nil 
        case a :: b :: xs => if (a < b) a :: removeOneMax (b :: xs) else b :: removeOneMax (a :: xs) 
        case Nil => Nil 
    }
    

    Here is a recursive method, which only iterates once. If you need performance, you have to think about it, if not, not.

    You can make it tail-recursive in the standard way: giving an extra parameter carry, which is per default the empty List, and collects the result while iterating. That is, of course, a bit longer, but if you need performance, you have to pay for it:

    import annotation.tailrec 
    @tailrec
    def removeOneMax (xs: List [Int], carry: List [Int] = List.empty) : List [Int] = xs match {                                  
      case a :: b :: xs => if (a < b) removeOneMax (b :: xs, a :: carry) else removeOneMax (a :: xs, b :: carry) 
      case x :: Nil => carry 
      case Nil => Nil 
    }
    

    I don't know what the chances are, that later compilers will improve slower map-calls to be as fast as while-loops. However: You rarely need high speed solutions, but if you need them often, you will learn them fast.

    Do you know how big your collection has to be, to use a whole second for your solution on your machine?

    As oneliner, similar to Daniel C. Sobrals solution:

    ((Nil : List[Int], xs(0)) /: xs.tail) ((p, x)=> if (p._2 > x) (x :: p._1, p._2) else ((p._2 :: p._1), x))._1
    

    but that is hard to read, and I didn't measure the effective performance. The normal pattern is (x /: xs) ((a, b) => /* something */). Here, x and a are pairs of List-so-far and max-so-far, which solves the problem to bring everything into one line of code, but isn't very readable. However, you can earn reputation on CodeGolf this way, and maybe someone likes to make a performance measurement.

    And now to our big surprise, some measurements:

    An updated timing-method, to get the garbage collection out of the way, and have the hotspot-compiler warm up, a main, and many methods from this thread, together in an Object named

    object PerfRemMax {
    
      def timed (name: String, xs: List [Int]) (f: List [Int] => List [Int]) = {
        val a = System.currentTimeMillis 
        val res = f (xs)
        val z = System.currentTimeMillis 
        val delta = z-a
        println (name + ": "  + (delta / 1000.0))
        res
      }
    
    def main (args: Array [String]) : Unit = {
      val n = args(0).toInt
      val funs : List [(String, List[Int] => List[Int])] = List (
        "indexOf/take-drop" -> adrian1 _, 
        "arraybuf"      -> adrian2 _, /* out of memory */
        "paradigmatic1"     -> pm1 _, /**/
        "paradigmatic2"     -> pm2 _, 
        // "match" -> uu1 _, /*oom*/
        "tailrec match"     -> uu2 _, 
        "foldLeft"      -> uu3 _,
        "buf-=buf.max"  -> soc1 _, 
        "for/yield"     -> soc2 _,
        "splitAt"       -> daniel1,
        "ListBuffer"    -> daniel2
        )
    
      val r = util.Random 
      val xs = (for (x <- 1 to n) yield r.nextInt (n)).toList 
    
    // With 1 Mio. as param, it starts with 100 000, 200k, 300k, ... 1Mio. cases. 
    // a) warmup
    // b) look, where the process gets linear to size  
      funs.foreach (f => {
        (1 to 10) foreach (i => {
            timed (f._1, xs.take (n/10 * i)) (f._2)
            compat.Platform.collectGarbage
        });
        println ()
      })
    }
        
    

    I renamed all the methods, and had to modify uu2 a bit, to fit to the common method declaration (List [Int] => List [Int]).

    From the long result, i only provide the output for 1M invocations:

    scala -Dserver PerfRemMax 2000000
    indexOf/take-drop:  0.882
    arraybuf:   1.681
    paradigmatic1:  0.55
    paradigmatic2:  1.13
    tailrec match: 0.812
    foldLeft:   1.054
    buf-=buf.max:   1.185
    for/yield:  0.725
    splitAt:    1.127
    ListBuffer: 0.61
    

    The numbers aren't completly stable, depending on the sample size, and a bit varying from run to run. For example, for 100k to 1M runs, in steps of 100k, the timing for splitAt was as follows:

    splitAt: 0.109
    splitAt: 0.118
    splitAt: 0.129
    splitAt: 0.139
    splitAt: 0.157
    splitAt: 0.166
    splitAt: 0.749
    splitAt: 0.752
    splitAt: 1.444
    splitAt: 1.127
    

    The initial solution is already pretty fast. splitAt is a modification from Daniel, often faster, but not always.

    The measurement was done on a single core 2Ghz Centrino, running xUbuntu Linux, Scala-2.8 with Sun-Java-1.6 (desktop).

    The two lessons for me are:

    • always measure your performance improvements; it is very hard to estimate it, if you don't do it on a daily basis
    • it is not only fun, to write functional code - sometimes the result is even faster

    Here is a link to my benchmarkcode, if somebody is interested.

    0 讨论(0)
  • 2020-12-22 17:28

    Try this:

    (myList.foldLeft((List[Int](), None: Option[Int]))) {
      case ((_, None),     x) => (List(),               Some(x))
      case ((Nil, Some(m), x) => (List(Math.min(x, m)), Some(Math.max(x, m))
      case ((l, Some(m),   x) => (Math.min(x, m) :: l,  Some(Math.max(x, m))
    })._1
    

    Idiomatic, functional, traverses only once. Maybe somewhat cryptic if you are not used to functional-programming idioms.

    Let's try to explain what is happening here. I will try to make it as simple as possible, lacking some rigor.

    A fold is an operation on a List[A] (that is, a list that contains elements of type A) that will take an initial state s0: S (that is, an instance of a type S) and a function f: (S, A) => S (that is, a function that takes the current state and an element from the list, and gives the next state, ie, it updates the state according to the next element).

    The operation will then iterate over the elements of the list, using each one to update the state according to the given function. In Java, it would be something like:

    interface Function<T, R> { R apply(T t); }
    class Pair<A, B> { ... }
    <State> State fold(List<A> list, State s0, Function<Pair<A, State>, State> f) {
      State s = s0;
      for (A a: list) {
        s = f.apply(new Pair<A, State>(a, s));
      }
      return s;
    }
    

    For example, if you want to add all the elements of a List[Int], the state would be the partial sum, that would have to be initialized to 0, and the new state produced by a function would simply add the current state to the current element being processed:

    myList.fold(0)((partialSum, element) => partialSum + element)
    

    Try to write a fold to multiply the elements of a list, then another one to find extreme values (max, min).

    Now, the fold presented above is a bit more complex, since the state is composed of the new list being created along with the maximum element found so far. The function that updates the state is more or less straightforward once you grasp these concepts. It simply puts into the new list the minimum between the current maximum and the current element, while the other value goes to the current maximum of the updated state.

    What is a bit more complex than to understand this (if you have no FP background) is to come up with this solution. However, this is only to show you that it exists, can be done. It's just a completely different mindset.

    EDIT: As you see, the first and second case in the solution I proposed are used to setup the fold. It is equivalent to what you see in other answers when they do xs.tail.fold((xs.head, ...)) {...}. Note that the solutions proposed until now using xs.tail/xs.head don't cover the case in which xs is List(), and will throw an exception. The solution above will return List() instead. Since you didn't specify the behavior of the function on empty lists, both are valid.

    0 讨论(0)
  • 2020-12-22 17:34

    The biggest inefficiency when you're writing a program is worrying about the wrong things. This is usually the wrong thing to worry about. Why?

    1. Developer time is generally much more expensive than CPU time — in fact, there is usually a dearth of the former and a surplus of the latter.

    2. Most code does not need to be very efficient because it will never be running on million-item datasets multiple times every second.

    3. Most code does need to bug free, and less code is less room for bugs to hide.

    0 讨论(0)
  • 2020-12-22 17:34

    What about this?

    def removeMax(xs: List[Int]) = {
      val buf = xs.toBuffer
      buf -= (buf.max)
    }
    

    A bit more ugly, but faster:

    def removeMax(xs: List[Int]) = {
      var max = xs.head
      for ( x <- xs.tail ) 
      yield {
        if (x > max) { val result = max; max = x; result}
        else x
      }
    }
    
    0 讨论(0)
  • 2020-12-22 17:34

    Another option would be:

    package code.array
    
    object SliceArrays {
      def main(args: Array[String]): Unit = {
        println(removeMaxCool(Vector(1,2,3,100,12,23,44)))
      }
      def removeMaxCool(xs: Vector[Int]) = xs.filter(_ < xs.max)
    }
    

    Using Vector instead of List, the reason is that Vector is more versatile and has a better general performance and time complexity if compared to List.

    Consider the following collections operations: head, tail, apply, update, prepend, append

    Vector takes an amortized constant time for all operations, as per Scala docs: "The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys"

    While List takes constant time only for head, tail and prepend operations.

    Using

    scalac -print

    generates:

    package code.array {
      object SliceArrays extends Object {
        def main(args: Array[String]): Unit = scala.Predef.println(SliceArrays.this.removeMaxCool(scala.`package`.Vector().apply(scala.Predef.wrapIntArray(Array[Int]{1, 2, 3, 100, 12, 23, 44})).$asInstanceOf[scala.collection.immutable.Vector]()));
        def removeMaxCool(xs: scala.collection.immutable.Vector): scala.collection.immutable.Vector = xs.filter({
      ((x$1: Int) => SliceArrays.this.$anonfun$removeMaxCool$1(xs, x$1))
    }).$asInstanceOf[scala.collection.immutable.Vector]();
        final <artifact> private[this] def $anonfun$removeMaxCool$1(xs$1: scala.collection.immutable.Vector, x$1: Int): Boolean = x$1.<(scala.Int.unbox(xs$1.max(scala.math.Ordering$Int)));
        def <init>(): code.array.SliceArrays.type = {
          SliceArrays.super.<init>();
          ()
        }
      }
    }
    
    0 讨论(0)
提交回复
热议问题