apache-zeppelin

Zeppeling throwing NullPointerException while configuring

无人久伴 提交于 2019-12-31 04:28:09
问题 I am trying to set up zeppelin-0.8.0 on my windos 8 r2 OS. I have already running spark on my console i.e. SPARK_HOME and JAVA_HOME, HADOOP_HOME set up and running fine. But while I am trying to execute printl("hello") in zeppelin spark interpreter it is throwing bellow error ... I already set SPARK_HOME and JAVA_HOME in zeppelin-env.cmd file. Error DEBUG [2019-01-22 10:05:34,129] ({pool-2-thread-2} RemoteInterpreterManagedProcess.java[start]:153) - callbackServer is serving now INFO [2019-01

Zeppelin - Cannot query with %sql a table I registered with pyspark

大憨熊 提交于 2019-12-30 08:36:07
问题 I am new to spark/zeppelin and I wanted to complete a simple exercise, where I will transform a csv file from pandas to Spark data frame and then register the table to query it with sql and visualise it using Zeppelin. But I seem to be failing in the last step. I am using Spark 1.6.1 Here is my code: %pyspark spark_clean_df.registerTempTable("table1") print spark_clean_df.dtypes print sqlContext.sql("select count(*) from table1").collect() Here is the output: [('id', 'bigint'), ('name',

Apache Zeppelin - How to use Helium framework in Apache Zeppelin

拟墨画扇 提交于 2019-12-30 02:17:07
问题 From Zeppelin-0.7, Zeppelin started supporting Helium plugins/packages using Helium Framework. However, I am not able to view any of the plugin on Helium page (localhost:8080/#/helium). As per this JIRA, I placed sample Helium.json (available on s3) under /local-repo/helium-registry-cache. However, after that I got NPE while restarting Apache Zeppelin service. I have tried Zeppelin 0.7 as well as Zeppelin 0.8.0 snaptshot versions. In particular, I want to use map Helium package - Helium-Map

Remove Temporary Tables from Apache SQL Spark

血红的双手。 提交于 2019-12-29 04:17:25
问题 I have registertemptable in Apache Spark using Zeppelin below: val hvacText = sc.textFile("...") case class Hvac(date: String, time: String, targettemp: Integer, actualtemp: Integer, buildingID: String) val hvac = hvacText.map(s => s.split(",")).filter(s => s(0) != "Date").map( s => Hvac(s(0), s(1), s(2).toInt, s(3).toInt, s(6))).toDF() hvac.registerTempTable("hvac") After I have done with my queries with this temp table, how do I remove it ? I checked all docs and it seems I am getting

AWS Redshift driver in Zeppelin

耗尽温柔 提交于 2019-12-24 17:17:03
问题 I want to explore my data in Redshift using notebook Zeppelin. A small EMR cluster with Spark is running behind. I am loading databricks' spark-redshift library %dep z.reset() z.load("com.databricks:spark-redshift_2.10:0.6.0") and then import org.apache.spark.sql.DataFrame val query = "..." val url = "..." val port=5439 val table = "..." val database = "..." val user = "..." val password = "..." val df: DataFrame = sqlContext.read .format("com.databricks.spark.redshift") .option("url", s"jdbc

dynamic interactive dashboard with zeppelin notebook

点点圈 提交于 2019-12-24 08:01:14
问题 I want to have a more interactive dashboard. like reading the data from database , giving it to select box, onchange of select box send the value and run the query. i want to achieve this using zeppelin bcz on selected value i have to display the analytics. what would be the way to achieve this and is this possible to achieve through zeppelin. i tried with select box, but i couldnot save the selected value and send it to next query and execute that. something like select age, count(1) value

ImportError: No module named sparkdl.image.imageIO

强颜欢笑 提交于 2019-12-24 07:58:36
问题 i'm doing image classification using spark. i have already imported sparkdl jar(added path of jar in the conf/spark.default) ImportError: No module named sparkdl.image.imageIO at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply

Zeppelin 0.7.2 version does not support spark 2.2.0

你说的曾经没有我的故事 提交于 2019-12-24 03:15:55
问题 How to downgrade the spark version? What could be the other solutions? I have to connect my hive tables to spark using spark session. But the spark version is not supported by zeppelin. 回答1: Here are 2 reasons. [1] Zeppelin 0.7.2 marked spark 2.2+ as the unsupported version. https://github.com/apache/zeppelin/blob/v0.7.2/spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java#L40 public static final SparkVersion UNSUPPORTED_FUTURE_VERSION = SPARK_2_2_0; [2] Even if you change the

zeppelin imported classes not found when using

寵の児 提交于 2019-12-24 02:16:58
问题 I get a weird error when using spark on zeppelin. The imported classes are not found when I use them. The code sample is : %spark import java.io.Serializable import java.text.{ParseException, SimpleDateFormat} import java.util.{Calendar, SimpleTimeZone} class Pos(val pos: String) extends Serializable { if (pos.length != 12) { throw new IllegalArgumentException(s"[${pos}] seems not a valid pos string") } private val cstFormat = new SimpleDateFormat("yyyyMMddHHmm") private val utcFormat = new

Build zeppelin-0.7.0 master branch with spark 2.0 failed with 'yarn install --no-lockfile' failed

自作多情 提交于 2019-12-23 23:51:37
问题 i tried to build the zeppelin-0.7.0 master branch downloaded from github, but failed . the build command: mvn package -Pyarn -Pbuild-distr -Pspark-2.0 -Dspark.version=2.0.1 -Phadoop-2.6 -Dhadoop.version=2.6.0 -Pscala-2.11 -Ppyspark -DskipTests -X the output stacktrace is: [ERROR] error Command failed with exit code 1. [INFO] info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command. [INFO] ------------------------------------------------------------------------