How Apache Zeppelin computes Spark job progress bar?

时间秒杀一切 提交于 2020-06-17 07:40:07

问题


When starting spark job from Apache Zeppelin notebook interface it shows you a progress bar of job execution. But what does this progress actually mean? Sometimes it shrinks or expands. Is it a progress of current stage or a whole job?


回答1:


In the web interface, the progress bar is showing the value returned by the getProgress function (not implemented for every interpeters, such as python).

This function returns a percentage.

When using the Spark interpreter, the value seems to be the percentage of tasks done (Calling the following progress function from JobProgressUtil) :

def progress(sc: SparkContext, jobGroup : String):Int = {
    val jobIds = sc.statusTracker.getJobIdsForGroup(jobGroup)
    val jobs = jobIds.flatMap { id => sc.statusTracker.getJobInfo(id) }
    val stages = jobs.flatMap { job =>
      job.stageIds().flatMap(sc.statusTracker.getStageInfo)
    }

    val taskCount = stages.map(_.numTasks).sum
    val completedTaskCount = stages.map(_.numCompletedTasks).sum
    if (taskCount == 0) {
      0
    } else {
      (100 * completedTaskCount.toDouble / taskCount).toInt
    }
}

Meanwhile, I could not find it specified in the Zeppelin documentation.



来源:https://stackoverflow.com/questions/56652680/how-apache-zeppelin-computes-spark-job-progress-bar

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!