Can anyone explain the difference between reducebykey,groupbykey,aggregatebykey and combinebykey? I have read the documents regarding this , but couldn\'t understand the exa
Then apart from these 4, we have
foldByKey which is same as reduceByKey but with a user defined Zero Value.
AggregateByKey takes 3 parameters as input and uses 2 functions for merging(one for merging on same partitions and another to merge values across partition. The first parameter is ZeroValue)
whereas
ReduceBykey takes 1 parameter only which is a function for merging.
CombineByKey takes 3 parameter and all 3 are functions. Similar to aggregateBykey except it can have a function for ZeroValue.
GroupByKey takes no parameter and groups everything. Also, it is an overhead for data transfer across partitions.