yarn | 易学教程

Add Yarn cluster configuration to Spark application

阅读更多关于 Add Yarn cluster configuration to Spark application

问题 I'm trying to use spark on yarn in a scala sbt application instead of using spark-submit directly. I already have a remote yarn cluster running and I can connect to the yarn cluster run spark jobs in SparkR. But when I try to do similar thing in a scala application it couldn't load my environment variables to yarn configurations and instead use default yarn address and port. The sbt application is just a simple object: object simpleSparkApp { def main(args: Array[String]): Unit = { val conf =

Why YARN java heap space memory error?

阅读更多关于 Why YARN java heap space memory error?

问题 I want to try about setting memory in YARN, so I'll try to configure some parameter on yarn-site.xml and mapred-site.xml. By the way I use hadoop 2.6.0. But, I get an error when I do a mapreduce job. It says like this : 15/03/12 10:57:23 INFO mapreduce.Job: Task Id : attempt_1426132548565_0001_m_000002_0, Status : FAILED Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 I think that I

在YARN中，如何控制和监控map/reduce的并发数

阅读更多关于在YARN中，如何控制和监控map/reduce的并发数

配置建议： 1. In MR1, the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties dictated how many map and reduce slots each TaskTracker had. These properties no longer exist in YARN. Instead, YARN uses yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, which control the amount of memory and CPU on each node, both available to both maps and reduces Essentially: YARN has no TaskTrackers, but just generic NodeManagers. Hence, there's no more Map slots and Reduce slots separation. Everything depends on the amount of memory in use/demanded

how to submit mapreduce job with yarn api in java

阅读更多关于 how to submit mapreduce job with yarn api in java

I want submit my MR job using YARN java API, I try to do it like WritingYarnApplications , but I don't know what to add amContainer, below is code I have written: package org.apache.hadoop.examples; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse; import org.apache.hadoop.yarn.api.records.ApplicationId; import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext; import org.apache.hadoop.yarn.api.records.ContainerLaunchContext; import org.apache.hadoop.yarn.api.records.Resource; import org.apache.hadoop.yarn

Centos7安装Hadoop2.7

阅读更多关于 Centos7安装Hadoop2.7

准备 1、三台Centos7的机器，在/etc/hosts中都加上所有的hostname解析： 172.20.0.4　　node1 172.20.0.5　　node2 172.20.0.6　　node3 2、配置node1到三台机器的免密登录 3、全部安装jdk8 4、官网下载安装包：hadoop-2.7.7.tar.gz（推荐中国科学技术大学开源镜像：http://mirrors.ustc.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz） 5、计划：node1作为namenode和datanode，node2和node3作为datanode 配置三台都创建路径/mydata/，并配置环境变量： export HADOOP_HOME=/mydata/hadoop-2.7.7 export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH 下面在node1上修改hadoop的配置，解压hadoop-2.7.7.tar.gz到/mydata/，进入/mydata/hadoop-2.7.7/etc/hadoop/，修改以下文件（有些文件需要去掉,template后缀，或拷贝一份重命名）：

Setting yarn shuffle for spark makes spark-shell not start

阅读更多关于 Setting yarn shuffle for spark makes spark-shell not start

I have a 4 ubuntu 14.04 machines cluster where I am setting up spark 2.1.0 prebuilt for hadoop 2.7 to run on top of hadoop 2.7.3 and I am configuring it to work with yarn. Running jps in each node I get: node-1 22546 Master 22260 ResourceManager 22916 Jps 21829 NameNode 22091 SecondaryNameNode node-2 12321 Worker 12485 Jps 11978 DataNode node-3 15938 Jps 15764 Worker 15431 DataNode node-4 12251 Jps 12075 Worker 11742 DataNode Without yarn shuffle configuration ./bin/spark-shell --master yarn --deploy-mode client starts just fine when called in my node-1. In order to configure a External

Spark Creates Less Partitions Then minPartitions Argument on WholeTextFiles

阅读更多关于 Spark Creates Less Partitions Then minPartitions Argument on WholeTextFiles

I have a folder which has 14 files in it. I run the spark-submit with 10 executors on a cluster, which has resource manager as yarn. I create my first RDD as this: JavaPairRDD<String,String> files = sc.wholeTextFiles(folderPath.toString(), 10); However, files.getNumPartitions() gives me 7 or 8, randomly. Then I do not use coalesce/repartition anywhere and I finish my DAG with 7-8 partitions. As I know, we gave argument as the "minimum" number of partitions, so that why Spark divide my RDD to 7-8 partitions? I also run the same program with 20 partitions and it gave me 11 partitions. I have

vue-cli3.0引入px2rem与lib-flexible 移动端适配

阅读更多关于 vue-cli3.0引入px2rem与lib-flexible 移动端适配

lib-flexible 作用：让网页根据设备dpr和宽度，利用viewport和html根元素的font-size配合rem来适配不同尺寸的移动端设备安装：yarn add lib-flexible 引入：入口文件main.js中：import 'lib-flexible/flexible.js' pxtorem 作用：将项目中css的px转成rem单位，免去计算烦恼安装：yarn add postcss-pxtorem 配置：package.json内，在postcss内添加： "postcss": { "plugins": { "autoprefixer": {}, "postcss-pxtorem": { "rootValue": 75, // 设计稿宽度的1/10,（JSON文件中不加注释，此行注释及下行注释均删除） "propList":["*"] // 需要做转化处理的属性，如`hight`、`width`、`margin`等，`*`表示全部 } } }, TIPS 1、pxtorem中，对于想忽略的px写成大写即可，诸如 border:1PX solid #fff; 2、也可以选择postcss-px2rem，我更喜欢pxtorem的忽略方式，方便我vscode的beautify自动格式化代码来源： oschina 链接： https://my.oschina

yarn配置阿里源

阅读更多关于 yarn配置阿里源

1、查看一下当前源 yarn config get registry 2、切换为淘宝源 yarn config set registry https://registry.npm.taobao.org yarn config set sass_binary_site "https://npm.taobao.org/mirrors/node-sass/" yarn config set phantomjs_cdnurl "http://cnpmjs.org/downloads" yarn config set electron_mirror "https://npm.taobao.org/mirrors/electron/" yarn config set sqlite3_binary_host_mirror "https://foxgis.oss-cn-shanghai.aliyuncs.com/" yarn config set profiler_binary_host_mirror "https://npm.taobao.org/mirrors/node-inspector/" yarn config set chromedriver_cdnurl "https://cdn.npm.taobao.org/dist/chromedriver" ————————————————

Spark coalesce relationship with number of executors and cores

阅读更多关于 Spark coalesce relationship with number of executors and cores

问题 I'm bringing up a very silly question about Spark as I want to clear my confusion. I'm very new in Spark and still trying to understand how it works internally. Say, if I have a list of input files(assume 1000) which I want to process or write somewhere and I want to use coalesce to reduce my partition number to 100. Now I run this job with 12 executors with 5 cores per executor, that means 60 tasks when it runs. Does that mean, each of the task will work on one single partition independently