发表新帖

发表新帖

Why Apache Spark take function not parallel?

前端未结

关注

 2  498

春和景丽 2020-12-19 17:29

Reading Apache Spark guide at http://spark.apache.org/docs/latest/programming-guide.html it states :

$\"enter$

2条回答

再見小時候 (楼主)

2020-12-19 18:18

How would you implement it in parallel? Let's say you have 4 partitions and want to take first 5 elements. If you knew in advance the size of each partition, it would be easy: for example, if each partition has 3 elements driver asks partition 0 for all elements and it asks partition 1 for 2 elements. So the problem is that it isn't known how many elements each partition has.

Now, you could first calculate partition sizes, but this requires limiting the set of RDD transformations supported, calculating elements more than once, or some other tradeoff, and will generally need more communication overhead.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题