foldLeft or foldRight equivalent in Spark?

问题

In Spark's RDDs and DStreams we have the 'reduce' function for transforming an entire RDD into one element. However the reduce function takes (T,T) => T However if we want to reduce a List in Scala we can use foldLeft or foldRight which takes type (B)( (B,A) => B) This is very useful because you start folding with a type other then what is in your list.

Is there a way in Spark to do something similar? Where I can start with a value that is of different type then the elements in the RDD itself

回答1:

Use aggregate instead of reduce. It allows you also to specify a "zero" value of type B and a function like the one you want: (B,A) => B. Do note that you also need to merge separate aggregations done on separate executors, so a (B, B) => B function is also required.

Alternatively, if you want this aggregation as a side effect, an option is to use an accumulator. In particular, the accumulable type allows for the result type to be of a different type than the accumulating type.

Also, if you even need to do the same with a key-value RDD, use aggregateByKey.

来源：https://stackoverflow.com/questions/31813973/foldleft-or-foldright-equivalent-in-spark

标签

scala

apache-spark

spark-streaming

fold

rdd

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!