Example of the Scala aggregate function

前端 未结 7 936
傲寒
傲寒 2020-12-22 19:07

I have been looking and I cannot find an example or discussion of the aggregate function in Scala that I can understand. It seems pretty powerful.

Can t

7条回答
  •  一生所求
    2020-12-22 19:34

    Let's see if some ascii art doesn't help. Consider the type signature of aggregate:

    def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
    

    Also, note that A refers to the type of the collection. So, let's say we have 4 elements in this collection, then aggregate might work like this:

    z   A   z   A   z   A   z   A
     \ /     \ /seqop\ /     \ /    
      B       B       B       B
        \   /  combop   \   /
          B _           _ B
             \ combop  /
                  B
    

    Let's see a practical example of that. Say I have a GenSeq("This", "is", "an", "example"), and I want to know how many characters there are in it. I can write the following:

    Note the use of par in the below snippet of code. The second function passed to aggregate is what is called after the individual sequences are computed. Scala is only able to do this for sets that can be parallelized.

    import scala.collection.GenSeq
    val seq = GenSeq("This", "is", "an", "example")
    val chars = seq.par.aggregate(0)(_ + _.length, _ + _)
    

    So, first it would compute this:

    0 + "This".length     // 4
    0 + "is".length       // 2
    0 + "an".length       // 2
    0 + "example".length  // 7
    

    What it does next cannot be predicted (there are more than one way of combining the results), but it might do this (like in the ascii art above):

    4 + 2 // 6
    2 + 7 // 9
    

    At which point it concludes with

    6 + 9 // 15
    

    which gives the final result. Now, this is a bit similar in structure to foldLeft, but it has an additional function (B, B) => B, which fold doesn't have. This function, however, enables it to work in parallel!

    Consider, for example, that each of the four computations initial computations are independent of each other, and can be done in parallel. The next two (resulting in 6 and 9) can be started once their computations on which they depend are finished, but these two can also run in parallel.

    The 7 computations, parallelized as above, could take as little as the same time 3 serial computations.

    Actually, with such a small collection the cost in synchronizing computation would be big enough to wipe out any gains. Furthermore, if you folded this, it would only take 4 computations total. Once your collections get larger, however, you start to see some real gains.

    Consider, on the other hand, foldLeft. Because it doesn't have the additional function, it cannot parallelize any computation:

    (((0 + "This".length) + "is".length) + "an".length) + "example".length
    

    Each of the inner parenthesis must be computed before the outer one can proceed.

提交回复
热议问题