Creating spark tasks from within tasks (map functions) on the same application

折月煮酒 提交于 2019-12-11 05:03:49

问题


Is it possible to do a map from a mapper function (i.e from tasks) in pyspark? In other words, is it possible to open "sub tasks" from a task? If so - how do i pass the sparkContext to the tasks - just as a variable?

I would like to have a job that is composed from many tasks - each of these tasks should create many tasks as well, without going back to the driver.

My use case is like this: I am doing a code porting of an application that was written using work queues - to pyspark. In my old application tasks created other tasks - and we used this functionality. I don't want to redesign the whole code because of the move to spark (especially because i will have to make sure that both platform works in the transient phase between the systems)...


回答1:


Is it possible to open "sub tasks" from a task?

No, at least not in a healthy manner*.

A task is a command sent from the driver and Spark has as one Driver (central coordinator) that communicates with many distributed workers (executors).

As a result, what you ask for here, implies that every task can play the role of a sub-Driver. Not even a worker, which would have the same faith in my answer as the task.

Remarkable resources:

  1. What is a task in Spark? How does the Spark worker execute the jar file?
  2. What are workers, executors, cores in Spark Standalone cluster?

*With that said, I mean that I am not aware of any hack or something, which if exists would be too specific.



来源:https://stackoverflow.com/questions/39061114/creating-spark-tasks-from-within-tasks-map-functions-on-the-same-application

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!