What do the numbers on the progress bar mean in spark-shell?

前端 未结 2 531
清酒与你
清酒与你 2020-12-04 12:08

In my spark-shell, what do entries like the below mean when I execute a function ?

[Stage7:===========>                              (14174 + 5) / 62500         


        
2条回答
  •  生来不讨喜
    2020-12-04 12:31

    Let's assume you see the following (X,A,B,C are always non negative integers):

    [Stage X:==========>            (A + B) / C]
    

    (for example in the question X=7, A=14174, B=5 and C=62500)

    Here is what is going on at a high level: Spark breaks the work in stages and tasks in each stage. This progress indicator means that Stage X is comprised of C tasks. During the execution, A and B start at zero and keep changing. A is always the number of tasks already finished and B is the number of tasks currently executing. For a stage with many tasks (way more than the workers you have) you should expect to see B grow to a number that corresponds to how many workers you have in the cluster, then you should start seeing A increase as tasks complete. Towards the end, as the last few tasks execute, B will start decreasing until it reaches 0, at which point A should equal C, the stage is done, and spark moves to the next stage. C will stay constant during the whole time, remember it is the total number of tasks in the stage and never changes.

    The ====> shows the percentage of work done based on what I described above. At the beginning the > will be towards the left and will be moving to the right as tasks are completed.

提交回复
热议问题