How to know which worker a partition is executed at?

前提是你 提交于 2019-12-11 01:59:36

问题


I just try to find a way to get the locality of a RDD's partition in Spark.

After calling RDD.repartition() or PairRDD.combineByKey() the returned RDD is partitioned. I'd like to know which worker instances the partitions are at (for examining the partition behaviour)?!

Can someone give a clue?


回答1:


An interesting question that I'm sure has not so much interesting answer :)

First of all, applying transformations to your RDD has nothing to do with worker instances as they are separate "entities". Transformations create a RDD lineage (= a logical plan) while executors come to stage (no pun intended) only after an action is executed (and DAGScheduler transforms the logical plan into execution plan as a set of stages with tasks).

So, I believe the only way to know what executor a partition is executed at is to use org.apache.spark.SparkEnv to access the BlockManager that corresponds to a single executor. That's exactly how Spark knows/tracks executors (by their BlockManagers).

You could write a org.apache.spark.scheduler.SparkListener that would intercept onExecutorAdded, onBlockManagerAdded and their *Removed counterparts to know how to map executors to BlockManagers (but believe SparkEnv is enough).



来源:https://stackoverflow.com/questions/30725687/how-to-know-which-worker-a-partition-is-executed-at

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!