yarn

Yarn : yarn-site.xml changes not taking effect

点点圈 提交于 2019-12-31 04:13:07
问题 We have a spark streaming application running on HDFS 2.7.3 with Yarn as the resource manager....while running the application .. these two folders /tmp/hadoop/data/nm-local-dir/filecache /tmp/hadoop/data/nm-local-dir/filecache are filling up and hence the disk ..... so from my research found that configuring these two properties in yarn-site.xml will help <property> <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name> <value>2000</value> </property> <property> <name>yarn

h2o starting on YARN not working

☆樱花仙子☆ 提交于 2019-12-31 04:06:51
问题 When I start H2o on a cdh cluster I get the following error. I downloaded everything formt he wbesite and followed the tutorial. The command I ran was hadoop jar h2odriver.jar -nodes 2 -mapperXmx 1g -output hdfsOutputDirName It shows that containers are not being used. It's not clear what settings these would be on hadoop. I have given all settings memory. It's the 0.0 for memory that doesnt make sense, and why are the containers not using memory. Is the cluster even running now? ----- YARN

Hadoop高可用集群——HA

℡╲_俬逩灬. 提交于 2019-12-30 11:40:10
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 在Hadoop2.0之前,HDFS的NameNode存在单点故障问题。所谓HA,集高可用(7*24小时不中断服务)。HA严格意义来说应分成各个组件的HA机制:HDFS的HA和YARN的HA。HDFS HA功能通过配置Active/Standby两个NameNode实现在集群中对NameNode的热备份来解决单点故障。如果出现故障(如:机器崩溃/机器需要升级维护),这时可以通过HA将NameNode很快切换到另一台机器。 HA 集群配置 环境准备 配置主机名及主机名和ip映射 关闭防火墙 ssh免密登录 安装JDK,配置环境变量 配置Zookeeper集群 解压Zookeeper到指定目录 $ tar -zxvf zookeeper-3.4.10.tar.gz -C /export/servers 在/export/servers/zookeeper-3.4.10/这个目录下创建 zkData mkdir -p zkData 重命名/export/servers/zookeeper-3.4.10/conf 这个目录下的 zoo_sample.cfg 为 zoo.cfg并修改 mv zoo_sample.cfg zoo.cfg //具体配置 dataDir=/export/servers/zookeeper-3.4

Oozie/yarn: resource changed on src filesystem

蹲街弑〆低调 提交于 2019-12-30 10:17:10
问题 I have an Oozie workflow, with one of its step being a java step, running a jar stored on the local filesystem (the jar is present on all nodes). Initially, the jar was installed via a RPM, so they all have the same timestamp. While experimenting, I manually copied a new version over this jar, and I now get the message: org.apache.oozie.action.ActionExecutorException: JA009: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1516602562532_15451 to YARN : Application

hadoop系列四:mapreduce的使用(二)

随声附和 提交于 2019-12-30 05:23:09
转载请在页首明显处注明作者与出处 一:说明 此为大数据系列的一些博文,有空的话会陆续更新,包含大数据的一些内容,如hadoop,spark,storm,机器学习等。 当前使用的hadoop版本为2.6.4 此为mapreducer的第二章节 这一章节中有着 计算共同好友,推荐可能认识的人 上一篇 : hadoop系列三:mapreduce的使用(一) 一:说明 二:在开发工具在运行mapreducer 2.1:本地模式运行mapreducer 2.2:在开发工具中运行在yarn中 三:mapreduce实现join 3.1:sql数据库中的示例 3.2:mapreduce的实现思路 3.3:创建相应的javabean 3.4:创建mapper 3.5:创建reduce 3.6:完整代码 3.7:数据倾斜的问题 四:查找共同好友,计算可能认识的人 4.1:准备数据 4.2:计算指定用户是哪些人的好友 4.3:计算共同好友 五:使用GroupingComparator分组计算最大值 5.1:定义一个javabean 5.2:定义一个GroupingComparator 5.3:map代码 5.4:reduce的代码 5.5:启动类 六:自定义输出位置 6.1:自定义FileOutputFormat 七:自定义输入数据 八:全局计数器 九:多个job串联,定义执行顺序 十

Windows10 安装并配置 yarn

我们两清 提交于 2019-12-30 03:02:32
官网下载 https://www.yarnpkg.com/zh-Hans/docs/install#windows-stable 安装 默认安装路径,C:\Program Files (x86)\Yarn\,最好是把目录带空格的部分删掉,比如安装在 C:\Yarn\ 也可选择安装到其他路径,D:\Yarn 接着就是一路next 测试安装成功 打开任意一个终端,cmd,cmder,powershell,git bash都可以 输入命令 yarn -v 成功显示版本,就说明安装成功了 yarn,是一款可以用来代替 npm 的包管理工具,你当然也可以用 npm 来安装 yarn,效果是一样的。 配置 配置 yarn 默认的缓存位置(对于缓存在C盘不能忍的需要手动更改) 参考下面的教程 https://www.jianshu.com/p/1ab93268ddac 配置淘宝源(加速,不喜欢也可以换) 安装 yrm yarn global add yrm 你也可以用 npm i -g yrm 来安装 yrm 列出所有镜像 yrm ls 选择taobao镜像 yrm use taobao 以后安装其他包时会从淘宝的服务器上下载 手动配置环境变量 复制 yarn 的安装目录 本文结束。 来源: CSDN 作者: o0达达君0o 链接: https://blog.csdn.net/Goo_12138

/bin/bash: /bin/java: No such file or directory

大憨熊 提交于 2019-12-30 02:08:05
问题 I was trying to run a simple wordcount MapReduce Program using Java 1.7 SDK and Hadoop2.7.1 on Mac OS X EL Captain 10.11 and I am getting the following error message in my container log "stderr" /bin/bash: /bin/java: No such file or directory Application Log- 5/11/27 02:52:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/27 02:52:33 INFO client.RMProxy: Connecting to ResourceManager at /192.168.200.96

Spark On Yarn 中出现的问题记录

无人久伴 提交于 2019-12-29 16:47:22
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 1:运行候一直retry master:8032的问题 分析: 可能是因为yarn没有启动。 解决: 检查是否启动了服务. 用jps命令查看相关信息[是否存在ResourceManager] 2: mory used; 2.2 GB of 2.1 GB virtual memory used. Killing container. 分析: 可能是内存不够的问题,因为自己是使用的虚拟机内存只分配了1G,后来百度了下并不是物理内存的问题,yarn执行会检查虚拟内存,如果虚拟内存不够就会报此错误。 解决: 在{hadoopdir}/etc/Hadoop/yarn-site.xml文件中,修改检查虚拟内存的属性为false,如下: <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> 3:Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 不影响运行,但每次执行yarn都会把spark目录下的jars的包传到hdfs上

GCP Dataproc - configure YARN fair scheduler

北城以北 提交于 2019-12-29 09:07:58
问题 I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue. I have found this solution, How to configure monopolistic FIFO application queue in YARN? , but as I'm always creating a new cluster, I needed to automatize this. I have added this to cluster creation: "softwareConfig": { "properties": { "yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler

what difference between execute a map-reduce job using hadoop and java command

百般思念 提交于 2019-12-29 07:34:09
问题 Find many option for run a map-reduce program. Can any one explain difference between theses below commands. And what impact on Map-reduce job if any. java -jar MyMapReduce.jar [args] hadoop jar MyMapReduce.jar [args] yarn jar MyMapReduce.jar [args] In these command which one best or other in any? Can make configuration like display all information about job using Yarn and Job History (like as display Hadoop and yarn command) on Web service normal using port for web service 8088(YARN) on