apache-zeppelin

Apache Zeppelin tutorial failing

只谈情不闲聊 提交于 2019-12-13 19:23:00
问题 Recently I installed Zeppelin from git using mvn clean package -Pspark-1.5 -Dspark.version=1.5.1 -Phadoop-2.4 -Pyarn -Ppyspark -DskipTests and I can't run the tutorial because of this error: java.net.ConnectException Any idea why this is happening? I haven't modified any of the conf files because I am interested in running it using the embedded Spark binaries. I already check most of the threads here and none of them has worked. Thanks EDIT: I am using a Mac 回答1: Apache Zeppelin uses multi

Amazon EMR cluster matplotlib error

感情迁移 提交于 2019-12-13 12:28:38
问题 I'm using an AWS cluster EMR 5.3.1 with Hadoop + Spark + Hive + Zeppelin When I use Zeppelin and type command: %python import matplotlib.pyplot as plt plt.plot([1, 2, 3]) I get error: ImportError: Gtk3 backend requires pygobject to be installed. How to solve it? 回答1: Before importing pyplot module you need to change matplotlib's backend to Agg import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt plt.plot([1,2,3]) 来源: https://stackoverflow.com/questions/42481911/amazon-emr

Using Python with Zeppelin under the Spark 2 Interpreter

此生再无相见时 提交于 2019-12-13 03:31:28
问题 I have deployed HDP: 2.6.4 on a virtual machine I can see that the spark2 is not pointing to the correct python folder. My questions are 1) How can I find where my python is located? solution : Type whereis python and you will get a list of where it is 2) How can I update the existing python libraries and add new libraries to that folder ? For example, the equivalent of 'pip install numpy' on CLI. Nothing clear yet 3) How can I make Zeppelin Spark2 point at that specific directory that

Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

白昼怎懂夜的黑 提交于 2019-12-12 14:27:49
问题 Using Zeppelin 0.7.2 binaries from the main download, and Spark 2.1.0 w/ Hadoop 2.6, the following paragraph: val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("") Produces the following: java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class; at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49) at com.fasterxml.jackson.module.scala.deser

Why Scala Enumeration does not work in Apache Zeppelin but it works in maven

流过昼夜 提交于 2019-12-12 04:39:13
问题 Enumeration works as expected when I use it in a maven project(with the same Scala version). object t { object DashStyle extends Enumeration { val Solid,ShortDash = Value } def f(style: DashStyle.Value) = println(style) def main(args: Array[String]) = f(DashStyle.Solid) } But when it runs in Apache Zeppelin(Zeppelin 0.6, Spark 1.6, Scala 2.10, Java 1.8) object DashStyle extends Enumeration { val Solid,ShortDash = Value } def f(style: DashStyle.Value) = println(style) f(DashStyle.Solid) It

Zeppelin Oracle SQL query runs forewer

梦想的初衷 提交于 2019-12-12 04:03:37
问题 I am trying to use Zeppelin (v 0.7.0 java 1.8 on Windows 10; same with docker v .0.7.1) JDBC interpreter to query Oracle Database. So far I've found papers like example. I try to use jdbc interpreter with: common.max_count=100 default.driver=oracle.jdbc.pool.OracleDataSource default.password:$password default.user=$my_user_name default.url=jdbc:oracle:thin:@$host:1521/$service_name> zeppelin.jdbc.concurrent.max_connection=10 zeppelin.jdbc.concurrent.use=true Connection looks to be established

Interpreter hive not found in zeppelin's jdbc interpreter

女生的网名这么多〃 提交于 2019-12-12 02:19:46
问题 I have installed zeppelin on my centOS system. It is not listing hive under JDBC interpreter. I have hive installed on my system. Hive metastore and hiveserver2 are running. HIVE_HOME and HADOOP_HOME are set correctly. Error on Zeppelin editor : paragraph_1490339323949_-1789938581's Interpreter hive not found Error in Zeppelin log files : ERROR [2017-03-24 15:56:18,913] ({qtp1566723494-18} NotebookServer.java[afterStatusChange]:2018) - Error org.apache.zeppelin.interpreter

Spark UDF how to convert Map to column

被刻印的时光 ゝ 提交于 2019-12-12 01:42:58
问题 I am using Apache Zeppelin notebook. So spark is basically running in interactive mode. I can't use closure variable here since zeppelin throws org.apache.spark.SparkException: Task not serializable as it tries to serialize whole paragraph (bigger closure). So without closure approach only option I have is to pass map as a column to UDF. I have a following map collected from paried RDD: final val idxMap = idxMapRdd.collectAsMap Which is being used in one of spark transformation here: def

how to change spark.r.backendConnectionTimeout value?

余生长醉 提交于 2019-12-11 18:51:36
问题 When I use R in Zeppelin it works but when I leave the Zeppelin running one day, next day I get this error for R only: sparkR backend is dead, please try to increase spark.r.backendConnectionTimeout I see from the Spark configuration that default value is set to 6000 seconds. Does anybody know how to change this value and what value would be useful to keep Zeppelin running all the time? I can use other interpreters (Python, Jdbc, etc.) without having this problem. 回答1: If you use zeppelin 0.8

Apache Zeppelin: Running code automatically on startup?

纵饮孤独 提交于 2019-12-11 12:49:08
问题 This post explains how to add dependencies to Zeppelin from S3. Now I would like to run this code automatically whenever I launch Zeppelin. Is there a way to do that? 回答1: Found it. It can be done using the Zeppelin API: https://zeppelin.incubator.apache.org/docs/0.5.6-incubating/rest-api/rest-notebook.html 来源: https://stackoverflow.com/questions/37251236/apache-zeppelin-running-code-automatically-on-startup