apache-zeppelin

Apache Zeppelin - How to use Helium framework in Apache Zeppelin

假装没事ソ 提交于 2019-11-30 07:21:05
From Zeppelin-0.7, Zeppelin started supporting Helium plugins/packages using Helium Framework. However, I am not able to view any of the plugin on Helium page (localhost:8080/#/helium). As per this JIRA , I placed sample Helium.json (available on s3 ) under /local-repo/helium-registry-cache. However, after that I got NPE while restarting Apache Zeppelin service. I have tried Zeppelin 0.7 as well as Zeppelin 0.8.0 snaptshot versions. In particular, I want to use map Helium package - Helium-Map in Zeppelin note. Can some one point me to any guide or documentation having detailed steps of using

Using d3.js with Apache Zeppelin

 ̄綄美尐妖づ 提交于 2019-11-30 02:35:15
问题 I'm trying to add more visualization options to Apache Zeppelin by integrating it with d3.js I found an example where someone did it with leaflet.js here, and tried to do something similar -- unfortunately I'm not too familiar with angularJS (what Zeppelin uses to interpret front end languages). I'm also not streaming data. Below is my code, using just a simple tutorial example from d3.js %angular <div> <svg class="chart"></svg> </div> <script> function useD3() { var data = [4, 8, 15, 16, 23,

How to get the output from console streaming sink in Zeppelin?

亡梦爱人 提交于 2019-11-29 14:23:24
问题 I'm struggling to get the console sink working with PySpark Structured Streaming when run from Zeppelin. Basically, I'm not seeing any results printed to the screen, or to any logfiles I've found. My question: Does anyone have a working example of using PySpark Structured Streaming with a sink that produces output visible in Apache Zeppelin? Ideally it would also use the socket source, as that's easy to test with. I'm using: Ubuntu 16.04 spark-2.2.0-bin-hadoop2.7 zeppelin-0.7.3-bin-all

Register UDF to SqlContext from Scala to use in PySpark

蓝咒 提交于 2019-11-29 07:25:39
Is it possible to register a UDF (or function) written in Scala to use in PySpark ? E.g.: val mytable = sc.parallelize(1 to 2).toDF("spam") mytable.registerTempTable("mytable") def addOne(m: Integer): Integer = m + 1 // Spam: 1, 2 In Scala, the following is now possible: val UDFaddOne = sqlContext.udf.register("UDFaddOne", addOne _) val mybiggertable = mytable.withColumn("moreSpam", UDFaddOne(mytable("spam"))) // Spam: 1, 2 // moreSpam: 2, 3 I would like to use "UDFaddOne" in PySpark like %pyspark mytable = sqlContext.table("mytable") UDFaddOne = sqlContext.udf("UDFaddOne") # does not work

Remove Temporary Tables from Apache SQL Spark

一个人想着一个人 提交于 2019-11-28 21:21:12
I have registertemptable in Apache Spark using Zeppelin below: val hvacText = sc.textFile("...") case class Hvac(date: String, time: String, targettemp: Integer, actualtemp: Integer, buildingID: String) val hvac = hvacText.map(s => s.split(",")).filter(s => s(0) != "Date").map( s => Hvac(s(0), s(1), s(2).toInt, s(3).toInt, s(6))).toDF() hvac.registerTempTable("hvac") After I have done with my queries with this temp table, how do I remove it ? I checked all docs and it seems I am getting nowhere. Any guidance ? Spark 2.x For temporary views you can use Catalog.dropTempView : spark.catalog

What to set `SPARK_HOME` to?

耗尽温柔 提交于 2019-11-28 20:26:39
Installed apache-maven-3.3.3, scala 2.11.6, then ran: $ git clone git://github.com/apache/spark.git -b branch-1.4 $ cd spark $ build/mvn -DskipTests clean package Finally: $ git clone https://github.com/apache/incubator-zeppelin $ cd incubator-zeppelin/ $ mvn install -DskipTests Then ran the server: $ bin/zeppelin-daemon.sh start Running a simple notebook beginning with %pyspark , I got an error about py4j not being found. Just did pip install py4j ( ref ). Now I'm getting this error: pyspark is not responding Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", line 22, in

Reading csv files in zeppelin using spark-csv

ε祈祈猫儿з 提交于 2019-11-27 23:47:59
问题 I wanna read csv files in Zeppelin and would like to use databricks' spark-csv package: https://github.com/databricks/spark-csv In the spark-shell, I can use spark-csv with spark-shell --packages com.databricks:spark-csv_2.11:1.2.0 But how do I tell Zeppelin to use that package? Thanks in advance! 回答1: You need to add the Spark Packages repository to Zeppelin before you can use %dep on spark packages. %dep z.reset() z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages

Zeppelin: Scala Dataframe to python

僤鯓⒐⒋嵵緔 提交于 2019-11-27 19:53:18
If I have a Scala paragraph with a DataFrame, can I share and use that with python. (As I understand it pyspark uses py4j ) I tried this: Scala paragraph: x.printSchema z.put("xtable", x ) Python paragraph: %pyspark import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns the_data = z.get("xtable") print the_data sns.set() g = sns.PairGrid(data=the_data, x_vars=dependent_var, y_vars=sensor_measure_columns_names + operational_settings_columns_names, hue="UnitNumber", size=3, aspect=2.5) g = g.map(plt.plot, alpha=0.5) g = g.set(xlim=(300,0)) g = g.add_legend()

What to set `SPARK_HOME` to?

北战南征 提交于 2019-11-27 12:55:58
问题 Installed apache-maven-3.3.3, scala 2.11.6, then ran: $ git clone git://github.com/apache/spark.git -b branch-1.4 $ cd spark $ build/mvn -DskipTests clean package Finally: $ git clone https://github.com/apache/incubator-zeppelin $ cd incubator-zeppelin/ $ mvn install -DskipTests Then ran the server: $ bin/zeppelin-daemon.sh start Running a simple notebook beginning with %pyspark , I got an error about py4j not being found. Just did pip install py4j (ref). Now I'm getting this error: pyspark

Error while configuring Apache Zeppelin on Windows 10

大城市里の小女人 提交于 2019-11-27 08:56:20
问题 I get the following error while trying to install and configure Apache Zeppelin on Windows 10: org.apache.zeppelin.interpreter.InterpreterException: The filename, directory name, or volume label syntax is incorrect at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73) at org.apache.zeppelin.interpreter.remote