Active tasks is a negative number in Spark UI

后端 未结 2 1343
天涯浪人
天涯浪人 2020-12-14 17:44

When using spark-1.6.2 and pyspark, I saw this:

where you see that the active tasks are a negative number (the difference of the the total tasks fr

相关标签:
2条回答
  • 2020-12-14 18:22

    It is a Spark issue. It occurs when executors restart after failures. The JIRA issue for the same is already created. You can get more details about the same from https://issues.apache.org/jira/browse/SPARK-10141 link.

    0 讨论(0)
  • 2020-12-14 18:36

    Answered in the Spark-dev mailing list from S. Owen, there are several JIRA tickets that are relevant to this issue, such as:

    1. ResourceManager UI showing negative value
    2. NodeManager reports negative running containers

    This behavior usually occurs when (many) executors restart after failure(s).


    This behavior can also occur when the application uses too many executors. Use coalesce() to fix this case.

    To be exact, in Prepare my bigdata with Spark via Python, I had >400k partitions. I used data.coalesce(1024), as described in Repartition an RDD, and I was able to bypass that Spark UI bug. You see, partitioning, is a very important concept when it comes to Distributed Computing and Spark.

    In my question I also use 1-2k executors, so it must be related.

    Note: Too few partitions and you might experience this Spark Java Error: Size exceeds Integer.MAX_VALUE.

    0 讨论(0)
提交回复
热议问题