yarn

Hadoop multinode cluster too slow. How do I increase speed of data processing?

旧城冷巷雨未停 提交于 2019-12-23 04:53:40
问题 I have a 6 node cluster - 5 DN and 1 NN. All have 32 GB RAM. All slaves have 8.7 TB HDD. DN has 1.1 TB HDD. Here is the link to my core-site.xml , hdfs-site.xml , yarn-site.xml. After running an MR job, i checked my RAM Usage which is mentioned below: Namenode free -g total used free shared buff/cache available Mem: 31 7 15 0 8 22 Swap: 31 0 31 Datanode : Slave1 : free -g total used free shared buff/cache available Mem: 31 6 6 0 18 24 Swap: 31 3 28 Slave2: total used free shared buff/cache

How long does the Hadoop Resource Manager store the application information?

社会主义新天地 提交于 2019-12-23 04:32:53
问题 we read resource usage from various users and applications from the Hadoop Resource Manager using the official REST api. Our problem is that the application history does not last long enough so that it returns -1 values for used cores, memory and containers. We'd like to extend the duration that yarn stores the data but we don't know where to set the value. 回答1: You should check your mapred-site.xml and look at mapreduce.jobhistory.max-age-ms . As stated in: https://hadoop.apache.org/docs

Tez job fails when submitting by different user

血红的双手。 提交于 2019-12-23 04:20:28
问题 Configured Hadoop-2.6.0 HA cluster with Kerberos security. When submitting example job using tez-example-0.6.0.jar in yarn-tez framework from different user, getting the below exception Exception java.io.IOException: The ownership on the staging directory hdfs://clustername/tmp/staging is not as expected. It is owned by Kumar. The directory must be owned by the submitter TestUser or by TestUser The directory has full permission but still getting the above exception. But when submitting a job

Spark Yarn Memory configuration

大憨熊 提交于 2019-12-23 04:09:43
问题 I have a spark application that keeps failing on error: "Diagnostics: Container [pid=29328,containerID=container_e42_1512395822750_0026_02_000001] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 2.3 GB of 3.1 GB virtual memory used. Killing container." I saw lots of different parameters that was suggested to change to increase the physical memory. Can I please have the some explanation for the following parameters? mapreduce.map.memory.mb

spark - application returns different results based on different executor memory?

大憨熊 提交于 2019-12-23 04:04:27
问题 I am noticing some peculiar behaviour, i have spark job which reads the data and does some grouping ordering and join and creates an output file. The issue is when I run the same job on yarn with memory more than what the environment has eg the cluster has 50 GB and i submit spark-submit with close to 60 GB executor and 4gb driver memory. My results gets decreased seems like one of the data partitions or tasks are lost while processing. driver-memory 4g --executor-memory 4g --num-executors 12

Ant BuildException error building Hadoop 2.2.0

丶灬走出姿态 提交于 2019-12-23 04:03:16
问题 I've been having trouble to build Hadoop 2.2.0 using Maven 3.1.1, this is part of the output I get (full log at http://pastebin.com/FE6vu46M): [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main ................................ SUCCESS [27.471s] [INFO] Apache Hadoop Project POM ......................... SUCCESS [0.936s] [INFO] Apache Hadoop Annotations ......................... SUCCESS [3.819s] [INFO] Apache

Issue in Rollback (after rolling upgrade) from hadoop 2.7.1 to 2.4.0

微笑、不失礼 提交于 2019-12-23 03:38:11
问题 I tried to do rolling upgrade from hadoop 2.4.0 to hadoop 2.7.1. As per http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#dfsadmin_-rollingUpgrade one can rollback to previous release provided the finalise step is not done. I upgraded the setup but didnot finalise the upgrade and tried to rollback HDFS to 2.4.0 I tried the following steps Shutdown all NNs and DNs. Restore the pre-upgrade release in all machines. Start NN1 as Active with the "

How is virtual memory calculated in Spark?

旧城冷巷雨未停 提交于 2019-12-23 02:42:32
问题 I am using Spark on Hadoop and want to know how Spark allocates the virtual memory to executor. As per YARN vmem-pmem, it gives 2.1 times virtual memory to the container. Hence - if XMX is 1 GB then --> 1 GB * 2.1 = 2.1 GB is allocated to the container. How does it work on Spark? And is the below statement is correct? If I give Executor memory = 1 GB then, Total virtual memory = 1 GB * 2.1 * spark.yarn.executor.memoryOverhead. Is this true? If not, then how is virtual memory for an executor

Spark: Connection refused webapp proxy on yarn

好久不见. 提交于 2019-12-23 01:57:15
问题 I am using spark and hadoop on docker container: I have 3 container master and 2 slave. Everything is working properly but I have problem with spark proxy webapp when running a task. I can connect to yarn webapp though n http://172.20.0.2:8088/ I can also acces the nodes with http://172.20.0.3:8042/node and http://172.20.0.3:8043/node But when tring to access spark monitoring with this address http://172.20.0.2:8088/proxy/application_1521727348200_0001/ I get HTTP ERROR 500 Problem accessing

RN在Mac环境下开发环境搭建

六眼飞鱼酱① 提交于 2019-12-23 01:37:09
1.推荐使用 Homebrew 来安装 Node 和 Watchman。在命令行中执行下列命令安装: brew install node brew install watchman 如果你已经安装了 Node,请检查其版本是否在 v8.3 以上。安装完 Node 后建议设置 npm 镜像以加速后面的过程(或使用科学上网工具)。 注意:不要使用 cnpm!cnpm 安装的模块路径比较奇怪,packager 不能正常识别! npm config set registry https://registry.npm.taobao.org --global npm config set disturl https://npm.taobao.org/dist --global Watchman 则是由 Facebook 提供的监视文件系统变更的工具。安装此工具可以提高开发时的性能(packager 可以快速捕捉文件的变化从而实现实时刷新)。 Yarn、React Native 的命令行工具(react-native-cli) Yarn 是 Facebook 提供的替代 npm 的工具,可以加速 node 模块的下载。React Native 的命令行工具用于执行创建、初始化、更新项目、运行打包服务(packager)等任务。 npm install -g yarn react-native