apache-zeppelin | 易学教程

Apache Zeppelin - How to use Helium framework in Apache Zeppelin

阅读更多关于 Apache Zeppelin - How to use Helium framework in Apache Zeppelin

From Zeppelin-0.7, Zeppelin started supporting Helium plugins/packages using Helium Framework. However, I am not able to view any of the plugin on Helium page (localhost:8080/#/helium). As per this JIRA , I placed sample Helium.json (available on s3 ) under /local-repo/helium-registry-cache. However, after that I got NPE while restarting Apache Zeppelin service. I have tried Zeppelin 0.7 as well as Zeppelin 0.8.0 snaptshot versions. In particular, I want to use map Helium package - Helium-Map in Zeppelin note. Can some one point me to any guide or documentation having detailed steps of using

Using d3.js with Apache Zeppelin

阅读更多关于 Using d3.js with Apache Zeppelin

问题 I'm trying to add more visualization options to Apache Zeppelin by integrating it with d3.js I found an example where someone did it with leaflet.js here, and tried to do something similar -- unfortunately I'm not too familiar with angularJS (what Zeppelin uses to interpret front end languages). I'm also not streaming data. Below is my code, using just a simple tutorial example from d3.js %angular <div> <svg class="chart"></svg> </div> <script> function useD3() { var data = [4, 8, 15, 16, 23,

How to get the output from console streaming sink in Zeppelin?

阅读更多关于 How to get the output from console streaming sink in Zeppelin?

问题 I'm struggling to get the console sink working with PySpark Structured Streaming when run from Zeppelin. Basically, I'm not seeing any results printed to the screen, or to any logfiles I've found. My question: Does anyone have a working example of using PySpark Structured Streaming with a sink that produces output visible in Apache Zeppelin? Ideally it would also use the socket source, as that's easy to test with. I'm using: Ubuntu 16.04 spark-2.2.0-bin-hadoop2.7 zeppelin-0.7.3-bin-all

Register UDF to SqlContext from Scala to use in PySpark

阅读更多关于 Register UDF to SqlContext from Scala to use in PySpark

Is it possible to register a UDF (or function) written in Scala to use in PySpark ? E.g.: val mytable = sc.parallelize(1 to 2).toDF("spam") mytable.registerTempTable("mytable") def addOne(m: Integer): Integer = m + 1 // Spam: 1, 2 In Scala, the following is now possible: val UDFaddOne = sqlContext.udf.register("UDFaddOne", addOne _) val mybiggertable = mytable.withColumn("moreSpam", UDFaddOne(mytable("spam"))) // Spam: 1, 2 // moreSpam: 2, 3 I would like to use "UDFaddOne" in PySpark like %pyspark mytable = sqlContext.table("mytable") UDFaddOne = sqlContext.udf("UDFaddOne") # does not work

Remove Temporary Tables from Apache SQL Spark

阅读更多关于 Remove Temporary Tables from Apache SQL Spark

I have registertemptable in Apache Spark using Zeppelin below: val hvacText = sc.textFile("...") case class Hvac(date: String, time: String, targettemp: Integer, actualtemp: Integer, buildingID: String) val hvac = hvacText.map(s => s.split(",")).filter(s => s(0) != "Date").map( s => Hvac(s(0), s(1), s(2).toInt, s(3).toInt, s(6))).toDF() hvac.registerTempTable("hvac") After I have done with my queries with this temp table, how do I remove it ? I checked all docs and it seems I am getting nowhere. Any guidance ? Spark 2.x For temporary views you can use Catalog.dropTempView : spark.catalog

What to set `SPARK_HOME` to?

阅读更多关于 What to set `SPARK_HOME` to?

Installed apache-maven-3.3.3, scala 2.11.6, then ran: $ git clone git://github.com/apache/spark.git -b branch-1.4 $ cd spark $ build/mvn -DskipTests clean package Finally: $ git clone https://github.com/apache/incubator-zeppelin $ cd incubator-zeppelin/ $ mvn install -DskipTests Then ran the server: $ bin/zeppelin-daemon.sh start Running a simple notebook beginning with %pyspark , I got an error about py4j not being found. Just did pip install py4j ( ref ). Now I'm getting this error: pyspark is not responding Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", line 22, in

Reading csv files in zeppelin using spark-csv

阅读更多关于 Reading csv files in zeppelin using spark-csv

问题 I wanna read csv files in Zeppelin and would like to use databricks' spark-csv package: https://github.com/databricks/spark-csv In the spark-shell, I can use spark-csv with spark-shell --packages com.databricks:spark-csv_2.11:1.2.0 But how do I tell Zeppelin to use that package? Thanks in advance! 回答1: You need to add the Spark Packages repository to Zeppelin before you can use %dep on spark packages. %dep z.reset() z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages

Zeppelin: Scala Dataframe to python

阅读更多关于 Zeppelin: Scala Dataframe to python

If I have a Scala paragraph with a DataFrame, can I share and use that with python. (As I understand it pyspark uses py4j ) I tried this: Scala paragraph: x.printSchema z.put("xtable", x ) Python paragraph: %pyspark import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns the_data = z.get("xtable") print the_data sns.set() g = sns.PairGrid(data=the_data, x_vars=dependent_var, y_vars=sensor_measure_columns_names + operational_settings_columns_names, hue="UnitNumber", size=3, aspect=2.5) g = g.map(plt.plot, alpha=0.5) g = g.set(xlim=(300,0)) g = g.add_legend()

What to set `SPARK_HOME` to?

阅读更多关于 What to set `SPARK_HOME` to?

问题 Installed apache-maven-3.3.3, scala 2.11.6, then ran: $ git clone git://github.com/apache/spark.git -b branch-1.4 $ cd spark $ build/mvn -DskipTests clean package Finally: $ git clone https://github.com/apache/incubator-zeppelin $ cd incubator-zeppelin/ $ mvn install -DskipTests Then ran the server: $ bin/zeppelin-daemon.sh start Running a simple notebook beginning with %pyspark , I got an error about py4j not being found. Just did pip install py4j (ref). Now I'm getting this error: pyspark

Error while configuring Apache Zeppelin on Windows 10

阅读更多关于 Error while configuring Apache Zeppelin on Windows 10

问题 I get the following error while trying to install and configure Apache Zeppelin on Windows 10: org.apache.zeppelin.interpreter.InterpreterException: The filename, directory name, or volume label syntax is incorrect at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73) at org.apache.zeppelin.interpreter.remote