apache-zeppelin

When registering a table using the %pyspark interpreter in Zeppelin, I can't access the table in %sql

十年热恋 提交于 2019-12-04 07:20:57
I am using Zeppelin 0.5.5. I found this code/sample here for python as I couldn't get my own to work with %pyspark http://www.makedatauseful.com/python-spark-sql-zeppelin-tutorial/ . I have a feeling his %pyspark example worked because if you using the original %spark zeppelin tutorial the "bank" table is already created. This code is in a notebook. %pyspark from os import getcwd # sqlContext = SQLContext(sc) # Removed with latest version I tested zeppelinHome = getcwd() bankText = sc.textFile(zeppelinHome+"/data/bank-full.csv") bankSchema = StructType([StructField("age", IntegerType(), False)

Build a SparkSession

感情迁移 提交于 2019-12-04 07:00:57
问题 I have a spark as interpreter in Zeppelin. I'm using a Spark2.0, I built a Session: Create 回答1: In general you should not initialize SparkSession nor SparkContext in Zeppelin. Zeppelin notebooks are configured to create session for you, and their correct behavior depends on using provided objects. Initializing your SparkSession will break core Zeppelin functionalities, and multiple SparkContexts will break things completely in the worst case scenario. Is set spark.driver.allowMultipleContexts

Getting error while building the Apache Zeppelin

空扰寡人 提交于 2019-12-04 05:37:40
问题 I have my hadoop already setup with cloudera. I wanted to install zeppelin to connect with hive and build the UI for my queries. While building the zeppelin command with the following command: sudo mvn clean package -Pspark-1.3 -Dspark.version=1.3.0 -Dhadoop.version=2.6.0-cdh5.4.7 -Phadoop-2.6 -Pyarn -DskipTests I get this error at the web-application module : [ERROR] npm ERR! Linux 3.19.0-71-generic [ERROR] npm ERR! argv "/home/zeppelin/incubator-zeppelin/zeppelin-web/node/node" "/home

Are there better interface to add Highcharts support to Zeppelin

醉酒当歌 提交于 2019-12-04 04:32:07
问题 Apache Zeppelin has good support for AngularJS. While there is a gap between Scala and Javascript. I am trying to add Highcharts support to Zeppelin to fill in this gap. The main goal is to plot it simply directly from Spark DataFrame. After couple round refactor, I come up with the following interface. github.com/knockdata/zeppelin-highcharts Here are two options. Which option is better? Option A This is an example to plot highcharts. highcharts(bank, "marital", List("name" -> "age", "y" ->

com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.5.3

谁都会走 提交于 2019-12-04 04:23:22
My OS is OS X 10.11.6. I'm running Spark 2.0, Zeppelin 0.6, Scala 2.11 When I run this code in Zeppelin I get an exception from Jackson. When I run this code in spark-shell - no exception. val filestream = ssc.textFileStream("/Users/davidlaxer/first-edition/ch06") com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.5.3 at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:56) at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19) at com.fasterxml.jackson.databind.ObjectMapper

How to install libraries to python in zeppelin-spark2 in HDP

左心房为你撑大大i 提交于 2019-12-03 21:47:04
I am using HDP Version: 2.6.4 Can you provide a step by step instructions on how to install libraries to the following python directory under spark2 ? The sc.version (spark version) returns res0: String = 2.2.0.2.6.4.0-91 The spark2 interpreter name and value is as following zeppelin.pyspark.python: /usr/local/Python-3.4.8/bin/python3.4 The python version and current libraries are %spark2.pyspark import pip import sys sorted(["%s==%s" % (i.key, i.version) for i in pip.get_installed_distributions()]) print("--") print (sys.version) print("--") print(installed_packages_list) -- 3.4.8 (default,

SparkSession return nothing with an HiveServer2 connection throught JDBC

别说谁变了你拦得住时间么 提交于 2019-12-03 14:32:47
I have an issue about reading data from a remote HiveServer2 using JDBC and SparkSession in Apache Zeppelin. Here is the code. %spark import org.apache.spark.sql.Row import org.apache.spark.sql.SparkSession val prop = new java.util.Properties prop.setProperty("user","hive") prop.setProperty("password","hive") prop.setProperty("driver", "org.apache.hive.jdbc.HiveDriver") val test = spark.read.jdbc("jdbc:hive2://xxx.xxx.xxx.xxx:10000/", "tests.hello_world", prop) test.select("*").show() When i run this, I've got no errors but no data too, i just retrieve all the column name of table, like this :

Is it possible to set global variables in a Zeppelin Notebook?

こ雲淡風輕ζ 提交于 2019-12-03 14:19:44
I'm trying to create a multi-paragraph dashboard using a Zeppelin notebook. I'd like people using the dashboard to only have to enter certain parameters once. E.g. if I'm making a dashboard with information about different websites, the dashboard user only has to select the particular website they want information about once and the whole multi-paragraph dashboard will update. Is this possible? How do I set global variables like this in a notebook? To clarify, the parameter input that I intend to use for Zeppelin is referred to as "dynamic form" . Rockie Yang Using z.put and z.get can share

Moving Spark DataFrame from Python to Scala whithn Zeppelin

别等时光非礼了梦想. 提交于 2019-12-03 09:04:31
I created a spark DataFrame in a Python paragraph in Zeppelin. sqlCtx = SQLContext(sc) spDf = sqlCtx.createDataFrame(df) and df is a pandas dataframe print(type(df)) <class 'pandas.core.frame.DataFrame'> what I want to do is moving spDf from one Python paragraph to another Scala paragraph. It look a reasonable way to do is using z.put . z.put("spDf", spDf) and I got this error: AttributeError: 'DataFrame' object has no attribute '_get_object_id' Any suggestion to fix the error? Or any suggestion to move spDf ? zero323 You can put internal Java object not a Python wrapper: %pyspark df = sc

How to set up Zeppelin to work with remote EMR Yarn cluster

守給你的承諾、 提交于 2019-12-03 08:30:25
I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn resource manager. I want to deploy Zeppelin on separate machine to allow turning off EMR cluster when there is no jobs running. I tried following instruction from here https://zeppelin.incubator.apache.org/docs/install/yarn_install.html with not much of success. Can somebody demystify steps how Zeppelin should connect to existing Yarn cluster from different machine? [1] install Zeppelin with proper params: git clone https://github.com/apache/incubator-zeppelin.git ~/zeppelin; cd ~/zeppelin; mvn clean package -Pspark-1.4 -Dhadoop