yarn

Add Yarn cluster configuration to Spark application

荒凉一梦 提交于 2019-12-07 03:28:59
问题 I'm trying to use spark on yarn in a scala sbt application instead of using spark-submit directly. I already have a remote yarn cluster running and I can connect to the yarn cluster run spark jobs in SparkR. But when I try to do similar thing in a scala application it couldn't load my environment variables to yarn configurations and instead use default yarn address and port. The sbt application is just a simple object: object simpleSparkApp { def main(args: Array[String]): Unit = { val conf =

Why YARN java heap space memory error?

一个人想着一个人 提交于 2019-12-06 23:54:02
问题 I want to try about setting memory in YARN, so I'll try to configure some parameter on yarn-site.xml and mapred-site.xml. By the way I use hadoop 2.6.0. But, I get an error when I do a mapreduce job. It says like this : 15/03/12 10:57:23 INFO mapreduce.Job: Task Id : attempt_1426132548565_0001_m_000002_0, Status : FAILED Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 I think that I

在YARN中,如何控制和监控map/reduce的并发数

戏子无情 提交于 2019-12-06 18:24:08
配置建议: 1. In MR1, the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties dictated how many map and reduce slots each TaskTracker had. These properties no longer exist in YARN. Instead, YARN uses yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, which control the amount of memory and CPU on each node, both available to both maps and reduces Essentially: YARN has no TaskTrackers, but just generic NodeManagers. Hence, there's no more Map slots and Reduce slots separation. Everything depends on the amount of memory in use/demanded

how to submit mapreduce job with yarn api in java

扶醉桌前 提交于 2019-12-06 16:36:47
I want submit my MR job using YARN java API, I try to do it like WritingYarnApplications , but I don't know what to add amContainer, below is code I have written: package org.apache.hadoop.examples; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse; import org.apache.hadoop.yarn.api.records.ApplicationId; import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext; import org.apache.hadoop.yarn.api.records.ContainerLaunchContext; import org.apache.hadoop.yarn.api.records.Resource; import org.apache.hadoop.yarn

Centos7安装Hadoop2.7

我只是一个虾纸丫 提交于 2019-12-06 16:35:02
准备 1、三台Centos7的机器,在/etc/hosts中都加上所有的hostname解析: 172.20.0.4  node1 172.20.0.5  node2 172.20.0.6  node3 2、配置node1到三台机器的免密登录 3、全部安装jdk8 4、官网下载安装包:hadoop-2.7.7.tar.gz(推荐中国科学技术大学开源镜像:http://mirrors.ustc.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz) 5、计划:node1作为namenode和datanode,node2和node3作为datanode 配置 三台都创建路径/mydata/,并配置环境变量: export HADOOP_HOME=/mydata/hadoop-2.7.7 export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH 下面在node1上修改hadoop的配置,解压hadoop-2.7.7.tar.gz到/mydata/,进入/mydata/hadoop-2.7.7/etc/hadoop/,修改以下文件(有些文件需要去掉,template后缀,或拷贝一份重命名): <!-- 文件名 core-site.xml -->

Setting yarn shuffle for spark makes spark-shell not start

六眼飞鱼酱① 提交于 2019-12-06 16:03:08
I have a 4 ubuntu 14.04 machines cluster where I am setting up spark 2.1.0 prebuilt for hadoop 2.7 to run on top of hadoop 2.7.3 and I am configuring it to work with yarn. Running jps in each node I get: node-1 22546 Master 22260 ResourceManager 22916 Jps 21829 NameNode 22091 SecondaryNameNode node-2 12321 Worker 12485 Jps 11978 DataNode node-3 15938 Jps 15764 Worker 15431 DataNode node-4 12251 Jps 12075 Worker 11742 DataNode Without yarn shuffle configuration ./bin/spark-shell --master yarn --deploy-mode client starts just fine when called in my node-1. In order to configure a External

Spark Creates Less Partitions Then minPartitions Argument on WholeTextFiles

徘徊边缘 提交于 2019-12-06 13:46:59
I have a folder which has 14 files in it. I run the spark-submit with 10 executors on a cluster, which has resource manager as yarn. I create my first RDD as this: JavaPairRDD<String,String> files = sc.wholeTextFiles(folderPath.toString(), 10); However, files.getNumPartitions() gives me 7 or 8, randomly. Then I do not use coalesce/repartition anywhere and I finish my DAG with 7-8 partitions. As I know, we gave argument as the "minimum" number of partitions, so that why Spark divide my RDD to 7-8 partitions? I also run the same program with 20 partitions and it gave me 11 partitions. I have

vue-cli3.0引入px2rem与lib-flexible 移动端适配

≯℡__Kan透↙ 提交于 2019-12-06 12:10:34
lib-flexible 作用:让网页根据设备dpr和宽度,利用viewport和html根元素的font-size配合rem来适配不同尺寸的移动端设备 安装:yarn add lib-flexible 引入:入口文件main.js中:import 'lib-flexible/flexible.js' pxtorem 作用:将项目中css的px转成rem单位,免去计算烦恼 安装:yarn add postcss-pxtorem 配置:package.json内,在postcss内添加: "postcss": { "plugins": { "autoprefixer": {}, "postcss-pxtorem": { "rootValue": 75, // 设计稿宽度的1/10,(JSON文件中不加注释,此行注释及下行注释均删除) "propList":["*"] // 需要做转化处理的属性,如`hight`、`width`、`margin`等,`*`表示全部 } } }, TIPS 1、pxtorem中,对于想忽略的px写成大写即可,诸如 border:1PX solid #fff; 2、也可以选择postcss-px2rem,我更喜欢pxtorem的忽略方式,方便我vscode的beautify自动格式化代码 来源: oschina 链接: https://my.oschina

yarn配置阿里源

久未见 提交于 2019-12-06 10:07:35
1、查看一下当前源 yarn config get registry 2、切换为淘宝源 yarn config set registry https://registry.npm.taobao.org yarn config set sass_binary_site "https://npm.taobao.org/mirrors/node-sass/" yarn config set phantomjs_cdnurl "http://cnpmjs.org/downloads" yarn config set electron_mirror "https://npm.taobao.org/mirrors/electron/" yarn config set sqlite3_binary_host_mirror "https://foxgis.oss-cn-shanghai.aliyuncs.com/" yarn config set profiler_binary_host_mirror "https://npm.taobao.org/mirrors/node-inspector/" yarn config set chromedriver_cdnurl "https://cdn.npm.taobao.org/dist/chromedriver" ————————————————

Spark coalesce relationship with number of executors and cores

喜夏-厌秋 提交于 2019-12-06 09:34:09
问题 I'm bringing up a very silly question about Spark as I want to clear my confusion. I'm very new in Spark and still trying to understand how it works internally. Say, if I have a list of input files(assume 1000) which I want to process or write somewhere and I want to use coalesce to reduce my partition number to 100. Now I run this job with 12 executors with 5 cores per executor, that means 60 tasks when it runs. Does that mean, each of the task will work on one single partition independently