What is yarn-client mode in Spark?

后端 未结 6 1179
既然无缘
既然无缘 2020-12-07 12:23

Apache Spark has recently updated the version to 0.8.1, in which yarn-client mode is available. My question is, what does yarn-client mode really mean? In the d

6条回答
  •  青春惊慌失措
    2020-12-07 13:02

    Both spark and yarn are distributed framework , but their roles are different:

    Yarn is a resource management framework, for each application, it has following roles:

    ApplicationMaster: resource management of a single application, including ask for/release resource from Yarn for the application and monitor.

    Attempt: an attempt is just a normal process which does part of the whole job of the application. For example , a mapreduce job which consists of multiple mappers and reducers , each mapper and reducer is an Attempt.

    A common process of summiting a application to yarn is:

    1. The client submit the application request to yarn. In the request, Yarn should know the ApplicationMaster class; For SparkApplication, it is org.apache.spark.deploy.yarn.ApplicationMaster,for MapReduce job , it is org.apache.hadoop.mapreduce.v2.app.MRAppMaster.

    2. Yarn allocate some resource for the ApplicationMaster process and start the ApplicationMaster process in one of the cluster nodes;

    3. After ApplicationMaster starts, ApplicationMaster will request resource from Yarn for this Application and start up worker;

    For Spark, the distributed computing framework, a computing job is divided into many small tasks and each Executor will be responsible for each task, the Driver will collect the result of all Executor tasks and get a global result. A spark application has only one driver with multiple executors.

    So, then ,the problem comes when Spark is using Yarn as a resource management tool in a cluster:

    • In Yarn Cluster Mode, Spark client will submit spark application to yarn, both Spark Driver and Spark Executor are under the supervision of yarn. In yarn's perspective, Spark Driver and Spark Executor have no difference, but normal java processes, namely an application worker process. So, when the client process is gone , e.g. the client process is terminated or killed, the Spark Application on yarn is still running.

    • In yarn client mode, only the Spark Executor are under the
      supervision of yarn. The Yarn ApplicationMaster will request resource for just spark executor. The driver program is running in the client process which have nothing to do with yarn, just a process submitting application to yarn.So ,when the client leave, e.g. the client
      process exits, the Driver is down and the computing terminated.

提交回复
热议问题