yarn | 易学教程

SparkAction for yarn-cluster

阅读更多关于 SparkAction for yarn-cluster

问题 Using the Hortonworks HDP 2.3 preview sandbox (oozie:4.2.0.2.3.0.0-2130, spark:1.3 and Hadoop:2.7.1.2.3.0.0-2130), I am trying to invoke the oozie spark action using "yarn-cluster" as the master. The example provided in Oozie Spark Action is for running the spark action on "local" master. The same page also suggests to be able to run on Yarn, the spark assembly jar should be available to the spark action. I have two questions How do we make the spark assembly jar available to Spark Action?

Could not deallocate container for task attemptId NNN

阅读更多关于 Could not deallocate container for task attemptId NNN

问题 I'm trying to understand how the container allocates memory in YARN and their performance based on different hardware configuration. So, the machine has 30 GB RAM and I picked 24 GB for YARN and leave 6 GB for the system. yarn.nodemanager.resource.memory-mb=24576 Then I followed http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html to come up with some vales for Map & Reduce tasks memory. I leave these two to their default value:

Getting application ID from SparkR to create Spark UI url

阅读更多关于 Getting application ID from SparkR to create Spark UI url

问题 From the SparkR shell, I'd like to generate a link to view the Spark UI while in Yarn mode. Normally the Spark UI is at port 4040, but in Yarn mode apparently it is at something like [host]:9046/proxy/application_1234567890123_0001/ , where the last part of the path is the unique applicationId. Other SO answers show how to get the applicationID for the Scala and Python shells. How do we get the applicationID from SparkR? As a stab in the dark I tried SparkR:::callJMethod(sc, "applicationId")

Hadoop release version confusing

阅读更多关于 Hadoop release version confusing

问题 I am trying to figure out the different versions of hadoop and I got confusing after reading this page. Download 1.2.X - current stable version, 1.2 release 2.2.X - current stable 2.x version 2.3.X - current 2.x version 0.23.X - similar to 2.X.X but missing NN HA. Releases may be downloaded from Apache mirrors. Question: I think any release starting with 0.xx means it is a alpha version and should be not used in product, is that the case? What is the difference between 0.23.X and 2.3.X? it

The auxService:mapreduce_shuffle does not exist

阅读更多关于 The auxService:mapreduce_shuffle does not exist

问题 When I am trying to run the below command: # sqoop import --connect jdbc:mysql://IP Address/database --username root --password PASSWORD --table table_name --m 1 for importing the data from mysql database to HDFS, I am getting the error: The auxService:mapreduce_shuffle does not exist. Searched and browsed many sites, nothing helped. How to get rid of this issue? Please let me know if any more inputs are required. 回答1: Its an entry that you are missing in yarn-site.xml. Apply those entries in

YARN job history not coming

阅读更多关于 YARN job history not coming

问题 I am using the latest hadoop version 3.0.0 build from source code. I have my timeline service up and running and have configured hadoop to use that for job history also. But when I click on history in the resoucemanager UI I get the below error:- HTTP ERROR 404 Problem accessing /jobhistory/job/job_1444395439959_0001. Reason: NOT_FOUND Can someone please point out what I am missing here. Following is my yarn-site.xml:- <configuration>

yarn hadoop 2.4.0: info message: ipc.Client Retrying connect to server

阅读更多关于 yarn hadoop 2.4.0: info message: ipc.Client Retrying connect to server

问题 i've searched for two days for a solution. but nothing worked. First, i'm new to the whole hadoop/yarn/hdfs topic and want to configure a small cluster. the message above doesn't show up everytime i run an example from the mapreduce-examples.jar sometimes teragen works, sometimes not. in some cases the whole job failed, in others the job finishes successfully. sometimes the job failes, without printing the message above. 14/06/08 15:42:46 INFO ipc.Client: Retrying connect to server: FQDN

Hadoop YARN job is getting stucked at map 0% and reduce 0%

阅读更多关于 Hadoop YARN job is getting stucked at map 0% and reduce 0%

问题 I am trying to run a very simple job to test my hadoop setup so I tried with Word Count Example , which get stuck in 0% , so i tried some other simple jobs and each one of them stuck 52191_0003/ 14/07/14 23:55:51 INFO mapreduce.Job: Running job: job_1405376352191_0003 14/07/14 23:55:57 INFO mapreduce.Job: Job job_1405376352191_0003 running in uber mode : false 14/07/14 23:55:57 INFO mapreduce.Job: map 0% reduce 0% I am using hadoop version- Hadoop 2.3.0-cdh5.0.2 I did quick research on Google

FAILED Error: java.io.IOException: Initialization of all the collectors failed

阅读更多关于 FAILED Error: java.io.IOException: Initialization of all the collectors failed

问题 I am getting some error while running my MapReduce WordCount job. Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class wordcount.wordmapper at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs

Hadoop Number of Reducers Configuration Options Priority

阅读更多关于 Hadoop Number of Reducers Configuration Options Priority

问题 What are the priorities of the following 3 options for setting number of reduces? In other words, if all three are set, which one will be taken into account? Option1: setNumReduceTasks(2) within the application code Option2: -D mapreduce.job.reduces=2 as command line argument Option3: through $HADOOP_CONF_DIR/mapred-site.xml file <property> <name>mapreduce.job.reduces</name> <value>2</value> </property> 回答1: You have them racked in priority order - option 1 will override 2, and 2 will