Parallelize / avoid foreach loop in spark

后端 未结 3 948
执念已碎
执念已碎 2020-12-09 06:02

I wrote a class that gets a DataFrame, does some calculations on it and can export the results. The Dataframes are generated by a List of Keys. I know that i am doing this i

3条回答
  •  醉酒成梦
    2020-12-09 06:09

    You can use scala's parallel collections to achieve foreach parallelism on the driver side.

    val l = List(34, 32, 132, 352).par
    l.foreach{i => // your code to be run in parallel for each i}
    

    *However, a word of caution: is your cluster capable of running jobs parallely? You may submit the jobs to your spark cluster parallely but they may end up getting queued on the cluster and get executed sequentially.

提交回复
热议问题