apache-zeppelin | 易学教程

When registering a table using the %pyspark interpreter in Zeppelin, I can't access the table in %sql

阅读更多关于 When registering a table using the %pyspark interpreter in Zeppelin, I can't access the table in %sql

I am using Zeppelin 0.5.5. I found this code/sample here for python as I couldn't get my own to work with %pyspark http://www.makedatauseful.com/python-spark-sql-zeppelin-tutorial/ . I have a feeling his %pyspark example worked because if you using the original %spark zeppelin tutorial the "bank" table is already created. This code is in a notebook. %pyspark from os import getcwd # sqlContext = SQLContext(sc) # Removed with latest version I tested zeppelinHome = getcwd() bankText = sc.textFile(zeppelinHome+"/data/bank-full.csv") bankSchema = StructType([StructField("age", IntegerType(), False)

Build a SparkSession

阅读更多关于 Build a SparkSession

问题 I have a spark as interpreter in Zeppelin. I'm using a Spark2.0, I built a Session: Create 回答1: In general you should not initialize SparkSession nor SparkContext in Zeppelin. Zeppelin notebooks are configured to create session for you, and their correct behavior depends on using provided objects. Initializing your SparkSession will break core Zeppelin functionalities, and multiple SparkContexts will break things completely in the worst case scenario. Is set spark.driver.allowMultipleContexts

Getting error while building the Apache Zeppelin

阅读更多关于 Getting error while building the Apache Zeppelin

问题 I have my hadoop already setup with cloudera. I wanted to install zeppelin to connect with hive and build the UI for my queries. While building the zeppelin command with the following command: sudo mvn clean package -Pspark-1.3 -Dspark.version=1.3.0 -Dhadoop.version=2.6.0-cdh5.4.7 -Phadoop-2.6 -Pyarn -DskipTests I get this error at the web-application module : [ERROR] npm ERR! Linux 3.19.0-71-generic [ERROR] npm ERR! argv "/home/zeppelin/incubator-zeppelin/zeppelin-web/node/node" "/home

Are there better interface to add Highcharts support to Zeppelin

阅读更多关于 Are there better interface to add Highcharts support to Zeppelin

问题 Apache Zeppelin has good support for AngularJS. While there is a gap between Scala and Javascript. I am trying to add Highcharts support to Zeppelin to fill in this gap. The main goal is to plot it simply directly from Spark DataFrame. After couple round refactor, I come up with the following interface. github.com/knockdata/zeppelin-highcharts Here are two options. Which option is better? Option A This is an example to plot highcharts. highcharts(bank, "marital", List("name" -> "age", "y" ->

com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.5.3

阅读更多关于 com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.5.3

My OS is OS X 10.11.6. I'm running Spark 2.0, Zeppelin 0.6, Scala 2.11 When I run this code in Zeppelin I get an exception from Jackson. When I run this code in spark-shell - no exception. val filestream = ssc.textFileStream("/Users/davidlaxer/first-edition/ch06") com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.5.3 at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:56) at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19) at com.fasterxml.jackson.databind.ObjectMapper

How to install libraries to python in zeppelin-spark2 in HDP

阅读更多关于 How to install libraries to python in zeppelin-spark2 in HDP

I am using HDP Version: 2.6.4 Can you provide a step by step instructions on how to install libraries to the following python directory under spark2 ? The sc.version (spark version) returns res0: String = 2.2.0.2.6.4.0-91 The spark2 interpreter name and value is as following zeppelin.pyspark.python: /usr/local/Python-3.4.8/bin/python3.4 The python version and current libraries are %spark2.pyspark import pip import sys sorted(["%s==%s" % (i.key, i.version) for i in pip.get_installed_distributions()]) print("--") print (sys.version) print("--") print(installed_packages_list) -- 3.4.8 (default,

SparkSession return nothing with an HiveServer2 connection throught JDBC

阅读更多关于 SparkSession return nothing with an HiveServer2 connection throught JDBC

I have an issue about reading data from a remote HiveServer2 using JDBC and SparkSession in Apache Zeppelin. Here is the code. %spark import org.apache.spark.sql.Row import org.apache.spark.sql.SparkSession val prop = new java.util.Properties prop.setProperty("user","hive") prop.setProperty("password","hive") prop.setProperty("driver", "org.apache.hive.jdbc.HiveDriver") val test = spark.read.jdbc("jdbc:hive2://xxx.xxx.xxx.xxx:10000/", "tests.hello_world", prop) test.select("*").show() When i run this, I've got no errors but no data too, i just retrieve all the column name of table, like this :

Is it possible to set global variables in a Zeppelin Notebook?

阅读更多关于 Is it possible to set global variables in a Zeppelin Notebook?

I'm trying to create a multi-paragraph dashboard using a Zeppelin notebook. I'd like people using the dashboard to only have to enter certain parameters once. E.g. if I'm making a dashboard with information about different websites, the dashboard user only has to select the particular website they want information about once and the whole multi-paragraph dashboard will update. Is this possible? How do I set global variables like this in a notebook? To clarify, the parameter input that I intend to use for Zeppelin is referred to as "dynamic form" . Rockie Yang Using z.put and z.get can share

Moving Spark DataFrame from Python to Scala whithn Zeppelin

阅读更多关于 Moving Spark DataFrame from Python to Scala whithn Zeppelin

I created a spark DataFrame in a Python paragraph in Zeppelin. sqlCtx = SQLContext(sc) spDf = sqlCtx.createDataFrame(df) and df is a pandas dataframe print(type(df)) <class 'pandas.core.frame.DataFrame'> what I want to do is moving spDf from one Python paragraph to another Scala paragraph. It look a reasonable way to do is using z.put . z.put("spDf", spDf) and I got this error: AttributeError: 'DataFrame' object has no attribute '_get_object_id' Any suggestion to fix the error? Or any suggestion to move spDf ? zero323 You can put internal Java object not a Python wrapper: %pyspark df = sc

How to set up Zeppelin to work with remote EMR Yarn cluster

阅读更多关于 How to set up Zeppelin to work with remote EMR Yarn cluster

I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn resource manager. I want to deploy Zeppelin on separate machine to allow turning off EMR cluster when there is no jobs running. I tried following instruction from here https://zeppelin.incubator.apache.org/docs/install/yarn_install.html with not much of success. Can somebody demystify steps how Zeppelin should connect to existing Yarn cluster from different machine? [1] install Zeppelin with proper params: git clone https://github.com/apache/incubator-zeppelin.git ~/zeppelin; cd ~/zeppelin; mvn clean package -Pspark-1.4 -Dhadoop