apache-zeppelin | 易学教程

How to fix “Error opening block StreamChunkId” on external spark shuffle service

阅读更多关于 How to fix “Error opening block StreamChunkId” on external spark shuffle service

问题 I'm trying to run spark jobs from my zeppelin deployment in a kubernetes cluster. I have a spark shuffle service (daemonset - v2.2.0-k8s) running on a different namespace as well. Here are my spark configs (set on zeppelin pod) --conf spark.kubernetes.executor.docker.image=<spark-executor> --conf spark.executor.cores=5 --conf spark.driver.memory=5g --conf spark.executor.memory=5g --conf spark.kubernetes.authenticate.driver.serviceAccountName=<svc-account> --conf spark.local.dir=/tmp/spark

Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

阅读更多关于 Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

问题 I have a spark ec2 cluster where I am submitting a pyspark program from a Zeppelin notebook. I have loaded the hadoop-aws-2.7.3.jar and aws-java-sdk-1.11.179.jar and place them in the /opt/spark/jars directory of the spark instances. I get a java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException Why is spark not seeing the jars? Do I have to have to jars in all the slaves and specify a spark-defaults.conf for the master and slaves? Is there something that needs to be configured

Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

阅读更多关于 Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

阅读更多关于 Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Apache Zeppelin not working with https for maven repo

阅读更多关于 Apache Zeppelin not working with https for maven repo

问题 I'm running Apache Zeppelin 0.8.0 in Amazon EMR. Recently the spark interpreter started to fail to pull down library dependencies. This was because the zeppelin.interpreter.dep.mvnRepo configuration parameter was set to http://repo1.maven.org/maven2/ and the maven repo has recently stopped supporting http as outlined here: https://support.sonatype.com/hc/en-us/articles/360041287334 As per the maven documentation I updated the value of this parameter to https://repo1.maven.org/maven2/ but this

Apache Zeppelin not working with https for maven repo

阅读更多关于 Apache Zeppelin not working with https for maven repo

Why Zeppelin notebook is not able to connect to S3

阅读更多关于 Why Zeppelin notebook is not able to connect to S3

问题 I have installed Zeppelin, on my aws EC2 machine to connect to my spark cluster. Spark Version: Standalone: spark-1.2.1-bin-hadoop1.tgz I am able to connect to spark cluster but getting following error, when trying to access the file in S3 in my usecase. Code: sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "YOUR_KEY_ID") sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","YOUR_SEC_KEY") val file = "s3n://<bucket>/<key>" val data = sc.textFile(file) data.count file: String = s3n://

Using plotly with zeppellin in scala

阅读更多关于 Using plotly with zeppellin in scala

问题 I want to display my results in the form of a histogram in Zeppelin. I came across plotly. My code is in scala and I would like to know the steps to incorporate plotly into zeppelin using scala. Or is there any better way(libraries) that can be used to draw a histogram in Zeppelin(Scala)? 回答1: If you have a dataframe called plotTemp with columns "id","degree" then you can do the following: In a scala window register the dataframe as a temporary table plotTemp.registerTempTable("plotTemp")

Resource Allocation with Spark and Yarn

阅读更多关于 Resource Allocation with Spark and Yarn

问题 I am using Zeppelin 0.7.3 with Spark 2.3 in yarn-client mode. My settings are: Spark: spark.driver.memory 4096m spark.driver.memoryOverhead 3072m spark.executor.memory 4096m spark.executor.memoryOverhead 3072m spark.executor.cores 3 spark.executor.instances 3 Yarn: Minimum allocation: memory:1024, vCores:2 Maximum allocation: memory:9216, vCores:6 The application started by Zeppelin gets the following resources: Running Containers 4 Allocated CPU VCores 4 Allocated Memory MB 22528 I don't

Running zeppelin on spark cluster mode

阅读更多关于 Running zeppelin on spark cluster mode

问题 I am using this tutorial spark cluster on yarn mode in docker container to launch zeppelin in spark cluster in yarn mode. However I am stuck at step 4. I can't find conf/zeppelin-env.sh in my docker container to put further configuration. I tried putting these conf folder of zeppelin but yet now successful. Apart from that zeppelin notebook is also not running on localhost:9001. I am very new to distributed system, it would be great if someone can help me start zeppelin on spark cluster in