发表新帖

发表新帖

Explain the aggregate functionality in Spark

前端未结

关注

 9  2214

误落风尘 2020-12-07 12:23

I am looking for some better explanation of the aggregate functionality that is available via spark in python.

The example I have is as follows (using pyspark from

9条回答

情书的邮戳 (楼主)

2020-12-07 12:38

I don't have enough reputation points to comment on the previous answer by Maasg. Actually the zero value should be 'neutral' towards the seqop, meaning it wouldn't interfere with the seqop result, like 0 towards add, or 1 towards *;

You should NEVER try with non-neutral values as it might be applied arbitrary times. This behavior is not only tied to num of partitions.

I tried the same experiment as stated in the question. with 1 partition, the zero value was applied 3 times. with 2 partitions, 6 times. with 3 partitions, 9 times and this will go on.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题