Why is zipped faster than zip in Scala?

前端 未结 4 813
南方客
南方客 2020-12-05 01:56

I have written some Scala code to perform an element-wise operation on a collection. Here I defined two methods that perform the same task. One method uses zip

4条回答
  •  一整个雨季
    2020-12-05 02:34

    Consider lazyZip

    (as lazyZip bs) map { case (a, b) => a + b }
    

    instead of zip

    (as zip bs) map { case (a, b) => a + b }
    

    Scala 2.13 added lazyZip in favour of .zipped

    Together with .zip on views, this replaces .zipped (now deprecated). (scala/collection-strawman#223)

    zipped (and hence lazyZip) is faster than zip because, as explained by Tim and Mike Allen, zip followed by map will result in two separate transformations due to strictness, whilst zipped followed by map will result in a single transformation executed in one go due to laziness.

    zipped gives Tuple2Zipped, and analysing Tuple2Zipped.map,

    class Tuple2Zipped[...](val colls: (It1, It2)) extends ... {
      private def coll1 = colls._1
      private def coll2 = colls._2
    
      def map[...](f: (El1, El2) => B)(...) = {
        val b = bf.newBuilder(coll1)
        ...
        val elems1 = coll1.iterator
        val elems2 = coll2.iterator
    
        while (elems1.hasNext && elems2.hasNext) {
          b += f(elems1.next(), elems2.next())
        }
    
        b.result()
      }
    

    we see the two collections coll1 and coll2 are iterated over and on each iteration the function f passed to map is applied along the way

    b += f(elems1.next(), elems2.next())
    

    without having to allocate and transform intermediary structures.


    Applying Travis' benchmarking method, here is a comparison between new lazyZip and deprecated zipped where

    @State(Scope.Benchmark)
    @BenchmarkMode(Array(Mode.Throughput))
    class ZippedBench {
      import scala.collection.mutable._
      val as = ArraySeq.fill(10000)(math.random)
      val bs = ArraySeq.fill(10000)(math.random)
    
      def lazyZip(as: ArraySeq[Double], bs: ArraySeq[Double]): ArraySeq[Double] =
        as.lazyZip(bs).map{ case (a, b) => a + b }
    
      def zipped(as: ArraySeq[Double], bs: ArraySeq[Double]): ArraySeq[Double] =
        (as, bs).zipped.map { case (a, b) => a + b }
    
      def lazyZipJavaArray(as: Array[Double], bs: Array[Double]): Array[Double] =
        as.lazyZip(bs).map{ case (a, b) => a + b }
    
      @Benchmark def withZipped: ArraySeq[Double] = zipped(as, bs)
      @Benchmark def withLazyZip: ArraySeq[Double] = lazyZip(as, bs)
      @Benchmark def withLazyZipJavaArray: ArraySeq[Double] = lazyZipJavaArray(as.toArray, bs.toArray)
    }
    

    gives

    [info] Benchmark                          Mode  Cnt      Score      Error  Units
    [info] ZippedBench.withZipped            thrpt   20  20197.344 ± 1282.414  ops/s
    [info] ZippedBench.withLazyZip           thrpt   20  25468.458 ± 2720.860  ops/s
    [info] ZippedBench.withLazyZipJavaArray  thrpt   20   5215.621 ±  233.270  ops/s
    

    lazyZip seems to perform a bit better than zipped on ArraySeq. Interestingly, notice significantly degraded performance when using lazyZip on Array.

提交回复
热议问题