Apache Spark - foreach Vs foreachPartitions When to use What?

后端 未结 5 1262
别那么骄傲
别那么骄傲 2020-11-28 06:59

I would like to know if the foreachPartitions will results in better performance, due to an higher level of parallelism, compared to the foreach m

5条回答
  •  盖世英雄少女心
    2020-11-28 07:30

    The foreachPartition does not mean it is per node activity rather it is executed for each partition and it is possible you may have large number of partition compared to number of nodes in that case your performance may be degraded. If you intend to do a activity at node level the solution explained here may be useful although it is not tested by me

提交回复
热议问题