Spark MLlib - trainImplicit warning

爷,独闯天下 提交于 2019-12-03 11:52:02

Similar problem was described in Apache Spark mail lists - http://apache-spark-user-list.1001560.n3.nabble.com/Large-Task-Size-td9539.html

I think you can try to play with number of partitions (using repartition() method), depending of how many hosts, RAM, CPUs do you have.

Try also to investigate all steps via Web UI, where you can see number of stages, memory usage on each stage, and data locality.

Or just never mind about this warnings unless everything works correctly and fast.

This notification is hard-coded in Spark (scheduler/TaskSetManager.scala)

      if (serializedTask.limit > TaskSetManager.TASK_SIZE_TO_WARN_KB * 1024 &&
          !emittedTaskSizeWarning) {
        emittedTaskSizeWarning = true
        logWarning(s"Stage ${task.stageId} contains a task of very large size " +
          s"(${serializedTask.limit / 1024} KB). The maximum recommended task size is " +
          s"${TaskSetManager.TASK_SIZE_TO_WARN_KB} KB.")
      }

.

private[spark] object TaskSetManager {
  // The user will be warned if any stages contain a task that has a serialized size greater than
  // this.
  val TASK_SIZE_TO_WARN_KB = 100
} 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!