apache-zeppelin

How to fix “Error opening block StreamChunkId” on external spark shuffle service

和自甴很熟 提交于 2020-03-23 04:13:10
问题 I'm trying to run spark jobs from my zeppelin deployment in a kubernetes cluster. I have a spark shuffle service (daemonset - v2.2.0-k8s) running on a different namespace as well. Here are my spark configs (set on zeppelin pod) --conf spark.kubernetes.executor.docker.image=<spark-executor> --conf spark.executor.cores=5 --conf spark.driver.memory=5g --conf spark.executor.memory=5g --conf spark.kubernetes.authenticate.driver.serviceAccountName=<svc-account> --conf spark.local.dir=/tmp/spark

Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

橙三吉。 提交于 2020-03-21 22:04:19
问题 I have a spark ec2 cluster where I am submitting a pyspark program from a Zeppelin notebook. I have loaded the hadoop-aws-2.7.3.jar and aws-java-sdk-1.11.179.jar and place them in the /opt/spark/jars directory of the spark instances. I get a java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException Why is spark not seeing the jars? Do I have to have to jars in all the slaves and specify a spark-defaults.conf for the master and slaves? Is there something that needs to be configured

Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

你。 提交于 2020-03-21 22:02:28
问题 I have a spark ec2 cluster where I am submitting a pyspark program from a Zeppelin notebook. I have loaded the hadoop-aws-2.7.3.jar and aws-java-sdk-1.11.179.jar and place them in the /opt/spark/jars directory of the spark instances. I get a java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException Why is spark not seeing the jars? Do I have to have to jars in all the slaves and specify a spark-defaults.conf for the master and slaves? Is there something that needs to be configured

Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

我是研究僧i 提交于 2020-03-21 22:00:32
问题 I have a spark ec2 cluster where I am submitting a pyspark program from a Zeppelin notebook. I have loaded the hadoop-aws-2.7.3.jar and aws-java-sdk-1.11.179.jar and place them in the /opt/spark/jars directory of the spark instances. I get a java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException Why is spark not seeing the jars? Do I have to have to jars in all the slaves and specify a spark-defaults.conf for the master and slaves? Is there something that needs to be configured

Apache Zeppelin not working with https for maven repo

心不动则不痛 提交于 2020-03-04 05:07:43
问题 I'm running Apache Zeppelin 0.8.0 in Amazon EMR. Recently the spark interpreter started to fail to pull down library dependencies. This was because the zeppelin.interpreter.dep.mvnRepo configuration parameter was set to http://repo1.maven.org/maven2/ and the maven repo has recently stopped supporting http as outlined here: https://support.sonatype.com/hc/en-us/articles/360041287334 As per the maven documentation I updated the value of this parameter to https://repo1.maven.org/maven2/ but this

Apache Zeppelin not working with https for maven repo

只谈情不闲聊 提交于 2020-03-04 05:07:05
问题 I'm running Apache Zeppelin 0.8.0 in Amazon EMR. Recently the spark interpreter started to fail to pull down library dependencies. This was because the zeppelin.interpreter.dep.mvnRepo configuration parameter was set to http://repo1.maven.org/maven2/ and the maven repo has recently stopped supporting http as outlined here: https://support.sonatype.com/hc/en-us/articles/360041287334 As per the maven documentation I updated the value of this parameter to https://repo1.maven.org/maven2/ but this

Why Zeppelin notebook is not able to connect to S3

五迷三道 提交于 2020-02-14 08:08:16
问题 I have installed Zeppelin, on my aws EC2 machine to connect to my spark cluster. Spark Version: Standalone: spark-1.2.1-bin-hadoop1.tgz I am able to connect to spark cluster but getting following error, when trying to access the file in S3 in my usecase. Code: sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "YOUR_KEY_ID") sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","YOUR_SEC_KEY") val file = "s3n://<bucket>/<key>" val data = sc.textFile(file) data.count file: String = s3n://

Using plotly with zeppellin in scala

≯℡__Kan透↙ 提交于 2020-02-04 02:14:05
问题 I want to display my results in the form of a histogram in Zeppelin. I came across plotly. My code is in scala and I would like to know the steps to incorporate plotly into zeppelin using scala. Or is there any better way(libraries) that can be used to draw a histogram in Zeppelin(Scala)? 回答1: If you have a dataframe called plotTemp with columns "id","degree" then you can do the following: In a scala window register the dataframe as a temporary table plotTemp.registerTempTable("plotTemp")

Resource Allocation with Spark and Yarn

≡放荡痞女 提交于 2020-01-25 04:36:05
问题 I am using Zeppelin 0.7.3 with Spark 2.3 in yarn-client mode. My settings are: Spark: spark.driver.memory 4096m spark.driver.memoryOverhead 3072m spark.executor.memory 4096m spark.executor.memoryOverhead 3072m spark.executor.cores 3 spark.executor.instances 3 Yarn: Minimum allocation: memory:1024, vCores:2 Maximum allocation: memory:9216, vCores:6 The application started by Zeppelin gets the following resources: Running Containers 4 Allocated CPU VCores 4 Allocated Memory MB 22528 I don't

Running zeppelin on spark cluster mode

不羁岁月 提交于 2020-01-21 09:37:30
问题 I am using this tutorial spark cluster on yarn mode in docker container to launch zeppelin in spark cluster in yarn mode. However I am stuck at step 4. I can't find conf/zeppelin-env.sh in my docker container to put further configuration. I tried putting these conf folder of zeppelin but yet now successful. Apart from that zeppelin notebook is also not running on localhost:9001. I am very new to distributed system, it would be great if someone can help me start zeppelin on spark cluster in