Understanding treeReduce() in Spark
问题 You can see the implementation here: https://github.com/apache/spark/blob/ffa05c84fe75663fc33f3d954d1cb1e084ab3280/python/pyspark/rdd.py#L804 How does it different from the 'normal' reduce function? What does it mean depth = 2 ? I don't want that the reducer function will pass linearly on the partitions, but reduce each available pairs first, and then will iterate like that until i have only one pair and reduce it to 1, as shown in the picture: Does treeReduce achieve that? 回答1: Standard