yarn | 易学教程

Apache Spark: Yarn logs Analysis

阅读更多关于 Apache Spark: Yarn logs Analysis

问题 I am having a spark-streaming application, and I want to analyse the logs of the job using Elasticsearch-Kibana. My job is run on yarn cluster, so the logs are getting written to HDFS as I have set yarn.log-aggregation-enable to true. But, when I try to do this : hadoop fs -cat ${yarn.nodemanager.remote-app-log-dir}/${user.name}/logs/<application ID> I am seeing some encrypted/compressed data. What file format is this? How can I read the logs from this file? Can I use logstash to read this?

Failing oozie launcher on yarn-cluster mode

阅读更多关于 Failing oozie launcher on yarn-cluster mode

问题 so I'm trying to run a spark job on yarn-cluster mode (succeeded running it in local mode and yarn-client), but I am running into a problem where oozie launcher fails. Below is the error message from stderr . Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, java.lang.NoSuchMethodError: org.apache.spark.network.util.JavaUtils.byteStringAsBytes(Ljava/lang/String;)J org.apache.oozie.action.hadoop.JavaMainException: java.lang.NoSuchMethodError:

Yarn的资源调优

阅读更多关于 Yarn的资源调优

一、概述每个job提交到yarn上执行时，都会分配Container容器去运行，而这个容器需要资源才能运行，这个资源就是Cpu和内存。 1、CPU资源调度目前的CPU被Yarn划分为虚拟CPU，这是yarn自己引入的概念，因为每个服务器的Cpu计算能力不一样，有的机器可能是其他机器的计算能力的2倍，然后可以通过多配置几个虚拟内存弥补差异。在yarn中，cpu的相关配置如下。 yarn.nodemanager.resource.cpu-vcores 表示该节点服务器上yarn可以使用的虚拟的CPU个数，默认是8，推荐配置与核心个数相同，如果节点CPU的核心个数不足8个，需要调小这个值，yarn不会智能的去检测物理核数。如果机器性能较好，可以配置为物理核数的2倍。 yarn.scheduler.minimum-allocation-vcores 表示单个任务最小可以申请的虚拟核心数，默认为1 yarn.sheduler.maximum-allocation-vcores 表示单个任务最大可以申请的虚拟核数，默认为4；如果申请资源时，超过这个配置，会抛出 InvalidResourceRequestException 2、Memory资源调度 yarn一般允许用户配置每个节点上可用的物理资源，可用指的是将机器上内存减去hdfs的，hbase的等等剩下的可用的内存。 yarn

YARN Timeline service crashing as it couldn't create a directory under /tmp/

阅读更多关于 YARN Timeline service crashing as it couldn't create a directory under /tmp/

问题 I'm trying to set up Hadoop history (Timeline) service with the Advanced Configuration from Apache TimelineServer documentation. I launch the service with the following command: $ yarn-daemon.sh start historyserver . Then I see the ApplicationHistoryServer up and running but after few moments it crash with the following exception (from yarn-arbi-historyserver-annaba.log): 2014-08-28 18:34:21,974 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error

Spark enable security with secure YARN Hadoop cluster

阅读更多关于 Spark enable security with secure YARN Hadoop cluster

问题 I have an Hadoop 3.0 cluster configured with Kerberos. Everything works fine and YARN is started as well. Now I wish to add Spark on top of it and make full use of Hadoop and security. To do so I use a binary distribution of Spark 2.3 and modified the following. In spark-env.sh : YARN_CONF_DIR , set to the folder where my Hadoop configuration files core-site.xml , hdfs-site.xml and yarn-site.xml are located. In spark-defaults.conf : spark.master yarn spark.submit.deployMode cluster spark

How to launch Spark's ApplicationMaster on a particular node in YARN cluster?

阅读更多关于 How to launch Spark's ApplicationMaster on a particular node in YARN cluster?

问题 I have a YARN cluster with a master node running resource manager and 2 other nodes. I am able to submit a spark application from a client machine in "yarn-cluster" mode. Is there a way I can configure which node in the cluster launches the Spark application master? I ask this because if application master launches in master node it works fine but if it starts in other nodes I get this: Retrying connect to server: 0.0.0.0/0.0.0.0:8030 . and the job is simply accepted and never runs 回答1: If

setup/run spark (spark-shell) on yarn client mode

阅读更多关于 setup/run spark (spark-shell) on yarn client mode

问题 I am trying to make spark-shell working with YARN, however when I try to run the code like below: spark-shell \ --master yarn \ --deploy-mode client \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 1 The stacktrace I got is: 17/02/07 01:52:41 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster

Hadoop: specify yarn queue for distcp

阅读更多关于 Hadoop: specify yarn queue for distcp

问题 On our cluster we have set up dynamic resource pools. The rules are set so that first yarn will look at the specified queue, then to the username, then to primary group ... However with a distcp I can't seem to be able to specify a queue, it just sets it to the primary group. This is how I run it now (which doesn't work): hadoop distcp -Dmapred.job.queue.name:root.default ....... 回答1: You are committing a mistake in the specification of the parameter. You should not use ":" for separating the

Successful task generates mapreduce.counters.LimitExceededException when trying to commit

阅读更多关于 Successful task generates mapreduce.counters.LimitExceededException when trying to commit

问题 I have a Pig script running in MapReduce mode that's been receiving a persistent error which I've been unable to fix. The script spawns multiple MapReduce applications; after running for several hours one of the applications registers as SUCCEEDED but returns the following diagnostic message: We crashed after successfully committing. Recovering. The step that causes the failure is trying to perform a RANK over a dataset that's around 100GB, split across roughly 1000 mapreduce output files

Yarn application not getting killed even after Application Master is terminated

阅读更多关于 Yarn application not getting killed even after Application Master is terminated

问题 My application is suffering because of this issue, which is Even after killing the application master, the application is not actually getting killed. Its a known yarn issue YARN-3561. It occurs out of blue, So I have developed a fix in my application and I want to test it. But as of now this yarn issue is not replicating again. Is there any sure-shot way of replicating this issue so I can verify my fix? 回答1: I was able to replicate this by launching the application as daemon process by using