apache-zeppelin | 易学教程

Apache Zeppelin tutorial failing

阅读更多关于 Apache Zeppelin tutorial failing

问题 Recently I installed Zeppelin from git using mvn clean package -Pspark-1.5 -Dspark.version=1.5.1 -Phadoop-2.4 -Pyarn -Ppyspark -DskipTests and I can't run the tutorial because of this error: java.net.ConnectException Any idea why this is happening? I haven't modified any of the conf files because I am interested in running it using the embedded Spark binaries. I already check most of the threads here and none of them has worked. Thanks EDIT: I am using a Mac 回答1: Apache Zeppelin uses multi

Amazon EMR cluster matplotlib error

阅读更多关于 Amazon EMR cluster matplotlib error

问题 I'm using an AWS cluster EMR 5.3.1 with Hadoop + Spark + Hive + Zeppelin When I use Zeppelin and type command: %python import matplotlib.pyplot as plt plt.plot([1, 2, 3]) I get error: ImportError: Gtk3 backend requires pygobject to be installed. How to solve it? 回答1: Before importing pyplot module you need to change matplotlib's backend to Agg import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt plt.plot([1,2,3]) 来源： https://stackoverflow.com/questions/42481911/amazon-emr

Using Python with Zeppelin under the Spark 2 Interpreter

阅读更多关于 Using Python with Zeppelin under the Spark 2 Interpreter

问题 I have deployed HDP: 2.6.4 on a virtual machine I can see that the spark2 is not pointing to the correct python folder. My questions are 1) How can I find where my python is located? solution : Type whereis python and you will get a list of where it is 2) How can I update the existing python libraries and add new libraries to that folder ? For example, the equivalent of 'pip install numpy' on CLI. Nothing clear yet 3) How can I make Zeppelin Spark2 point at that specific directory that

Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

阅读更多关于 Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

问题 Using Zeppelin 0.7.2 binaries from the main download, and Spark 2.1.0 w/ Hadoop 2.6, the following paragraph: val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("") Produces the following: java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class; at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49) at com.fasterxml.jackson.module.scala.deser

Why Scala Enumeration does not work in Apache Zeppelin but it works in maven

阅读更多关于 Why Scala Enumeration does not work in Apache Zeppelin but it works in maven

问题 Enumeration works as expected when I use it in a maven project(with the same Scala version). object t { object DashStyle extends Enumeration { val Solid,ShortDash = Value } def f(style: DashStyle.Value) = println(style) def main(args: Array[String]) = f(DashStyle.Solid) } But when it runs in Apache Zeppelin(Zeppelin 0.6, Spark 1.6, Scala 2.10, Java 1.8) object DashStyle extends Enumeration { val Solid,ShortDash = Value } def f(style: DashStyle.Value) = println(style) f(DashStyle.Solid) It

Zeppelin Oracle SQL query runs forewer

阅读更多关于 Zeppelin Oracle SQL query runs forewer

问题 I am trying to use Zeppelin (v 0.7.0 java 1.8 on Windows 10; same with docker v .0.7.1) JDBC interpreter to query Oracle Database. So far I've found papers like example. I try to use jdbc interpreter with: common.max_count=100 default.driver=oracle.jdbc.pool.OracleDataSource default.password:$password default.user=$my_user_name default.url=jdbc:oracle:thin:@$host:1521/$service_name> zeppelin.jdbc.concurrent.max_connection=10 zeppelin.jdbc.concurrent.use=true Connection looks to be established

Interpreter hive not found in zeppelin's jdbc interpreter

阅读更多关于 Interpreter hive not found in zeppelin's jdbc interpreter

问题 I have installed zeppelin on my centOS system. It is not listing hive under JDBC interpreter. I have hive installed on my system. Hive metastore and hiveserver2 are running. HIVE_HOME and HADOOP_HOME are set correctly. Error on Zeppelin editor : paragraph_1490339323949_-1789938581's Interpreter hive not found Error in Zeppelin log files : ERROR [2017-03-24 15:56:18,913] ({qtp1566723494-18} NotebookServer.java[afterStatusChange]:2018) - Error org.apache.zeppelin.interpreter

Spark UDF how to convert Map to column

阅读更多关于 Spark UDF how to convert Map to column

问题 I am using Apache Zeppelin notebook. So spark is basically running in interactive mode. I can't use closure variable here since zeppelin throws org.apache.spark.SparkException: Task not serializable as it tries to serialize whole paragraph (bigger closure). So without closure approach only option I have is to pass map as a column to UDF. I have a following map collected from paried RDD: final val idxMap = idxMapRdd.collectAsMap Which is being used in one of spark transformation here: def

how to change spark.r.backendConnectionTimeout value?

阅读更多关于 how to change spark.r.backendConnectionTimeout value?

问题 When I use R in Zeppelin it works but when I leave the Zeppelin running one day, next day I get this error for R only: sparkR backend is dead, please try to increase spark.r.backendConnectionTimeout I see from the Spark configuration that default value is set to 6000 seconds. Does anybody know how to change this value and what value would be useful to keep Zeppelin running all the time? I can use other interpreters (Python, Jdbc, etc.) without having this problem. 回答1: If you use zeppelin 0.8

Apache Zeppelin: Running code automatically on startup?

阅读更多关于 Apache Zeppelin: Running code automatically on startup?

问题 This post explains how to add dependencies to Zeppelin from S3. Now I would like to run this code automatically whenever I launch Zeppelin. Is there a way to do that? 回答1: Found it. It can be done using the Zeppelin API: https://zeppelin.incubator.apache.org/docs/0.5.6-incubating/rest-api/rest-notebook.html 来源： https://stackoverflow.com/questions/37251236/apache-zeppelin-running-code-automatically-on-startup