How to get ID of a map task in Spark?

后端 未结 2 1133
有刺的猬
有刺的猬 2020-11-29 12:08

Is there a way to get ID of a map task in Spark? For example if each map task calls a user defined function, can I get the ID of that map task from whithin that user defined

2条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-29 12:55

    I am not sure what you mean by ID of map task but you can access task information using TaskContext:

    import org.apache.spark.TaskContext
    
    sc.parallelize(Seq[Int](), 4).mapPartitions(_ => {
        val ctx = TaskContext.get
        val stageId = ctx.stageId
        val partId = ctx.partitionId
        val hostname = java.net.InetAddress.getLocalHost().getHostName()
        Iterator(s"Stage: $stageId, Partition: $partId, Host: $hostname")
    }).collect.foreach(println)
    

    A similar functionality has been added to PySpark in Spark 2.2.0 (SPARK-18576):

    from pyspark import TaskContext
    import socket
    
    def task_info(*_):
        ctx = TaskContext()
        return ["Stage: {0}, Partition: {1}, Host: {2}".format(
            ctx.stageId(), ctx.partitionId(), socket.gethostname())]
    
    for x in sc.parallelize([], 4).mapPartitions(task_info).collect():
        print(x)
    

提交回复
热议问题