Spark difference between reduceByKey vs groupByKey vs aggregateByKey vs combineByKey

后端 未结 6 804
甜味超标
甜味超标 2020-12-04 06:15

Can anyone explain the difference between reducebykey,groupbykey,aggregatebykey and combinebykey? I have read the documents regarding this , but couldn\'t understand the exa

6条回答
  •  情书的邮戳
    2020-12-04 07:15

    Then apart from these 4, we have

    foldByKey which is same as reduceByKey but with a user defined Zero Value.

    AggregateByKey takes 3 parameters as input and uses 2 functions for merging(one for merging on same partitions and another to merge values across partition. The first parameter is ZeroValue)

    whereas

    ReduceBykey takes 1 parameter only which is a function for merging.

    CombineByKey takes 3 parameter and all 3 are functions. Similar to aggregateBykey except it can have a function for ZeroValue.

    GroupByKey takes no parameter and groups everything. Also, it is an overhead for data transfer across partitions.

提交回复
热议问题