yarn

SparkAction for yarn-cluster

二次信任 提交于 2019-12-11 03:48:23
问题 Using the Hortonworks HDP 2.3 preview sandbox (oozie:4.2.0.2.3.0.0-2130, spark:1.3 and Hadoop:2.7.1.2.3.0.0-2130), I am trying to invoke the oozie spark action using "yarn-cluster" as the master. The example provided in Oozie Spark Action is for running the spark action on "local" master. The same page also suggests to be able to run on Yarn, the spark assembly jar should be available to the spark action. I have two questions How do we make the spark assembly jar available to Spark Action?

Could not deallocate container for task attemptId NNN

ぃ、小莉子 提交于 2019-12-11 03:31:02
问题 I'm trying to understand how the container allocates memory in YARN and their performance based on different hardware configuration. So, the machine has 30 GB RAM and I picked 24 GB for YARN and leave 6 GB for the system. yarn.nodemanager.resource.memory-mb=24576 Then I followed http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html to come up with some vales for Map & Reduce tasks memory. I leave these two to their default value:

Getting application ID from SparkR to create Spark UI url

坚强是说给别人听的谎言 提交于 2019-12-11 03:28:10
问题 From the SparkR shell, I'd like to generate a link to view the Spark UI while in Yarn mode. Normally the Spark UI is at port 4040, but in Yarn mode apparently it is at something like [host]:9046/proxy/application_1234567890123_0001/ , where the last part of the path is the unique applicationId. Other SO answers show how to get the applicationID for the Scala and Python shells. How do we get the applicationID from SparkR? As a stab in the dark I tried SparkR:::callJMethod(sc, "applicationId")

Hadoop release version confusing

我与影子孤独终老i 提交于 2019-12-11 03:16:21
问题 I am trying to figure out the different versions of hadoop and I got confusing after reading this page. Download 1.2.X - current stable version, 1.2 release 2.2.X - current stable 2.x version 2.3.X - current 2.x version 0.23.X - similar to 2.X.X but missing NN HA. Releases may be downloaded from Apache mirrors. Question: I think any release starting with 0.xx means it is a alpha version and should be not used in product, is that the case? What is the difference between 0.23.X and 2.3.X? it

The auxService:mapreduce_shuffle does not exist

和自甴很熟 提交于 2019-12-11 03:16:02
问题 When I am trying to run the below command: # sqoop import --connect jdbc:mysql://IP Address/database --username root --password PASSWORD --table table_name --m 1 for importing the data from mysql database to HDFS, I am getting the error: The auxService:mapreduce_shuffle does not exist. Searched and browsed many sites, nothing helped. How to get rid of this issue? Please let me know if any more inputs are required. 回答1: Its an entry that you are missing in yarn-site.xml. Apply those entries in

YARN job history not coming

醉酒当歌 提交于 2019-12-11 03:07:22
问题 I am using the latest hadoop version 3.0.0 build from source code. I have my timeline service up and running and have configured hadoop to use that for job history also. But when I click on history in the resoucemanager UI I get the below error:- HTTP ERROR 404 Problem accessing /jobhistory/job/job_1444395439959_0001. Reason: NOT_FOUND Can someone please point out what I am missing here. Following is my yarn-site.xml:- <configuration> <!-- Site specific YARN configuration properties -->

yarn hadoop 2.4.0: info message: ipc.Client Retrying connect to server

送分小仙女□ 提交于 2019-12-11 02:52:32
问题 i've searched for two days for a solution. but nothing worked. First, i'm new to the whole hadoop/yarn/hdfs topic and want to configure a small cluster. the message above doesn't show up everytime i run an example from the mapreduce-examples.jar sometimes teragen works, sometimes not. in some cases the whole job failed, in others the job finishes successfully. sometimes the job failes, without printing the message above. 14/06/08 15:42:46 INFO ipc.Client: Retrying connect to server: FQDN

Hadoop YARN job is getting stucked at map 0% and reduce 0%

烂漫一生 提交于 2019-12-11 01:49:23
问题 I am trying to run a very simple job to test my hadoop setup so I tried with Word Count Example , which get stuck in 0% , so i tried some other simple jobs and each one of them stuck 52191_0003/ 14/07/14 23:55:51 INFO mapreduce.Job: Running job: job_1405376352191_0003 14/07/14 23:55:57 INFO mapreduce.Job: Job job_1405376352191_0003 running in uber mode : false 14/07/14 23:55:57 INFO mapreduce.Job: map 0% reduce 0% I am using hadoop version- Hadoop 2.3.0-cdh5.0.2 I did quick research on Google

FAILED Error: java.io.IOException: Initialization of all the collectors failed

元气小坏坏 提交于 2019-12-11 00:30:28
问题 I am getting some error while running my MapReduce WordCount job. Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class wordcount.wordmapper at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs

Hadoop Number of Reducers Configuration Options Priority

≯℡__Kan透↙ 提交于 2019-12-10 23:44:20
问题 What are the priorities of the following 3 options for setting number of reduces? In other words, if all three are set, which one will be taken into account? Option1: setNumReduceTasks(2) within the application code Option2: -D mapreduce.job.reduces=2 as command line argument Option3: through $HADOOP_CONF_DIR/mapred-site.xml file <property> <name>mapreduce.job.reduces</name> <value>2</value> </property> 回答1: You have them racked in priority order - option 1 will override 2, and 2 will