I would like to know if the foreachPartitions will results in better performance, due to an higher level of parallelism, compared to the foreach m
foreachPartition is only helpful when you're iterating through data which you are aggregating by partition.
A good example is processing clickstreams per user. You'd want to clear your calculation cache every time you finish a user's stream of events, but keep it between records of the same user in order to calculate some user behavior insights.