yarn

Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: root in alluxio mapreduce

▼魔方 西西 提交于 2019-12-24 10:59:20
问题 Caused by: org.apache.thrift.transport.TTransportException: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: root It works fine when I run wordcount program locally with alluxio . I also passed the integration test but when I run the same Hadoop program with alluxio client jar it gives me an error bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount -libjars /usr/lib/hadoop-mapreduce/alluxio-1.8.1-client.jar alluxio

How to change tmp directory in yarn

一个人想着一个人 提交于 2019-12-24 03:53:14
问题 I have written a MR job and have run it in local mode with following configuration settings mapred.local.dir=<<local directory having good amount of space>> fs.default.name=file:/// mapred.job.tracker=local on Hadoop 1.x Now I am using Hadoop 2.x and the same Job I am running with the same Configuration settings, but I am getting error : Disk Out of Space Is it that If I switch from Hadoop 1.x to 2.x (using Hadoop-2.6 jars), the same Configuration Settings to change the Tmp Dir not work.??

SPARK : OutOfMemoryError: Requested array size exceeds VM limit

孤者浪人 提交于 2019-12-23 23:16:05
问题 I am running a spark job on a EMR Cluster (A master with 10 slaves) of type r3.8xLarge: spark.driver.cores 30 spark.driver.memory 200g spark.executor.cores 16 spark.executor.instances 40 spark.executor.memory 60g spark.storage.memoryFraction 0.95 spark.sql.shuffle.partitions 2400 spark.default.parallelism 2400 spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:

hadoop2.2配置文件(较简单版)-最新版本

爱⌒轻易说出口 提交于 2019-12-23 16:04:37
一些准备工作就不说了,包括设置ssh连接等,主要说一下配置文件内容及启动过程,以192.168.157.100~105几台服务器为例: 1、core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop-kf100.jd.com:8020</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp/hadoop-${user.name}</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> </configuration> 2、hadoop-env.sh: 添加jdk安装目录:export

JDBC Driver not found - On submitting to YARN from Spark

我的未来我决定 提交于 2019-12-23 15:56:47
问题 Trying to read all rows from a DB table and write the same to another empty target table. So when I issue the following command at the main node, it works as expected - $./bin/spark-submit --class cs.TestJob_publisherstarget --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar,./lib/univocity-parsers-1.5.6.jar,./lib/commons-csv-1.1.1-SNAPSHOT.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar (Where: uber-ski-spark-job-0.0.1-SNAPSHOT.jar

Spark 1.6.1 SASL

♀尐吖头ヾ 提交于 2019-12-23 13:28:19
问题 I wonder if anyone gotten SASL to work with Spark 1.6.1 on YARN? Basically Spark documentation states that you only require 3 parameters enabled: spark.authenticate.enableSaslEncryption=true spark.network.sasl.serverAlwaysEncrypt=true spark.authenticate=true http://spark.apache.org/docs/latest/security.html However, upon launching my spark job with --master yarn and --deploy-mode client, I see the following in my spark executors logs: 6/05/17 06:50:51 ERROR client.TransportClientFactory:

java.io.IOException: ensureRemaining: Only 0 bytes remaining, trying to read 1

不问归期 提交于 2019-12-23 12:43:42
问题 i'm having some problems with custom classes in giraph. I made a VertexInput and Output format, but i always getting the following error: java.io.IOException: ensureRemaining: Only * bytes remaining, trying to read * with different values where the "*" are placed. This was tested on a Single Node Cluster. This problem happen when a vertexIterator do next(), and there aren't any more vertex left. This iterator it's invocated from a flush method, but i don't understand, basically, why the "next

Apache Spark Native Libraries

若如初见. 提交于 2019-12-23 11:46:40
问题 I was recently able to build Apache Hadoop 2.5.1 with native 64 bit support. So, I got rid of the annoying Native Libraries Warning. I'm trying to configure Apache Spark. When I start spark-shell, the same warning appears: 14/09/14 18:48:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Some tips: I had to download a pre-built 2.4 version of Spark because there is still no profile for Hadoop 2.5 with Maven. The

Making spark use /etc/hosts file for binding in YARN cluster mode

笑着哭i 提交于 2019-12-23 08:00:07
问题 Have a spark cluster setup on a machine with two inets, one public another private. The /etc/hosts file in the cluster has the internal ip of all the other machines in the cluster, like so. internal_ip FQDN However when I request a SparkContext via pyspark in YARN client mode( pyspark --master yarn --deploy-mode client ), akka binds onto the public ip and thus a time out occurs. 15/11/07 23:29:23 INFO Remoting: Starting remoting 15/11/07 23:29:23 INFO Remoting: Remoting started; listening on

How to solve yarn container sizing issue on spark?

天涯浪子 提交于 2019-12-23 07:06:01
问题 I want to launch some pyspark jobs on YARN . I have 2 nodes, with 10 GB each. I am able to open up the pyspark shell like so: pyspark Now when I have a very simple example that I try to launch: import random NUM_SAMPLES=1000 def inside(p): x, y = random.random(), random.random() return x*x + y*y < 1 count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ .filter(inside).count() print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) I get as a result a very long spark log with the error output. The