I have written some Scala code to perform an element-wise operation on a collection. Here I defined two methods that perform the same task. One method uses zip
Consider lazyZip
(as lazyZip bs) map { case (a, b) => a + b }
instead of zip
(as zip bs) map { case (a, b) => a + b }
Scala 2.13 added lazyZip
in favour of .zipped
Together with
.zip
on views, this replaces.zipped
(now deprecated). (scala/collection-strawman#223)
zipped
(and hence lazyZip
) is faster than zip
because, as explained by Tim and Mike Allen, zip
followed by map
will result in two separate transformations due to strictness, whilst zipped
followed by map
will result in a single transformation executed in one go due to laziness.
zipped
gives Tuple2Zipped
, and analysing Tuple2Zipped.map,
class Tuple2Zipped[...](val colls: (It1, It2)) extends ... {
private def coll1 = colls._1
private def coll2 = colls._2
def map[...](f: (El1, El2) => B)(...) = {
val b = bf.newBuilder(coll1)
...
val elems1 = coll1.iterator
val elems2 = coll2.iterator
while (elems1.hasNext && elems2.hasNext) {
b += f(elems1.next(), elems2.next())
}
b.result()
}
we see the two collections coll1
and coll2
are iterated over and on each iteration the function f
passed to map
is applied along the way
b += f(elems1.next(), elems2.next())
without having to allocate and transform intermediary structures.
Applying Travis' benchmarking method, here is a comparison between new lazyZip
and deprecated zipped
where
@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.Throughput))
class ZippedBench {
import scala.collection.mutable._
val as = ArraySeq.fill(10000)(math.random)
val bs = ArraySeq.fill(10000)(math.random)
def lazyZip(as: ArraySeq[Double], bs: ArraySeq[Double]): ArraySeq[Double] =
as.lazyZip(bs).map{ case (a, b) => a + b }
def zipped(as: ArraySeq[Double], bs: ArraySeq[Double]): ArraySeq[Double] =
(as, bs).zipped.map { case (a, b) => a + b }
def lazyZipJavaArray(as: Array[Double], bs: Array[Double]): Array[Double] =
as.lazyZip(bs).map{ case (a, b) => a + b }
@Benchmark def withZipped: ArraySeq[Double] = zipped(as, bs)
@Benchmark def withLazyZip: ArraySeq[Double] = lazyZip(as, bs)
@Benchmark def withLazyZipJavaArray: ArraySeq[Double] = lazyZipJavaArray(as.toArray, bs.toArray)
}
gives
[info] Benchmark Mode Cnt Score Error Units
[info] ZippedBench.withZipped thrpt 20 20197.344 ± 1282.414 ops/s
[info] ZippedBench.withLazyZip thrpt 20 25468.458 ± 2720.860 ops/s
[info] ZippedBench.withLazyZipJavaArray thrpt 20 5215.621 ± 233.270 ops/s
lazyZip
seems to perform a bit better than zipped
on ArraySeq
. Interestingly, notice significantly degraded performance when using lazyZip
on Array
.