apache-zeppelin | 易学教程

Running zeppelin on spark cluster mode

阅读更多关于 Running zeppelin on spark cluster mode

I am using this tutorial spark cluster on yarn mode in docker container to launch zeppelin in spark cluster in yarn mode. However I am stuck at step 4. I can't find conf/zeppelin-env.sh in my docker container to put further configuration. I tried putting these conf folder of zeppelin but yet now successful. Apart from that zeppelin notebook is also not running on localhost:9001. I am very new to distributed system, it would be great if someone can help me start zeppelin on spark cluster in yarn mode. Here is my docker-compose file to enable zeppelin talk with spark cluster. version: '2'

How to convert a mllib matrix to a spark dataframe?

阅读更多关于 How to convert a mllib matrix to a spark dataframe?

问题 I want to pretty print the result of a correlation in a zeppelin notebook: val Row(coeff: Matrix) = Correlation.corr(data, "features").head One of the ways to achieve this is to convert the result into a DataFrame with each value in a separate column and call z.show() . However, looking into the Matrix api I don't see any way to do this. Is there another straight forward way to achieve this? Edit: The dataframe has 50 columns. Just converting to a string would not help as the output get

How to run zeppelin notebook from command line (automatically)

阅读更多关于 How to run zeppelin notebook from command line (automatically)

How do we run the notebook from command line? Further to 1, how would I pass command line arguments into the notebook? I.e. access the command line args from within the notebook code? So I had the same issue and managed to work out how to use the API to run a notebook using curl. As for passing in command line arguments think there is simply no way to do that - you will have to use some sort of shared state on the server (e.g. have the notebook read from a file, and modify the file). Anyway this is how I managed to run a notebook, it assumes jq is installed. Pretty involved :( curl -XGET http:

Why Zeppelin notebook is not able to connect to S3

阅读更多关于 Why Zeppelin notebook is not able to connect to S3

I have installed Zeppelin , on my aws EC2 machine to connect to my spark cluster. Spark Version: Standalone: spark-1.2.1-bin-hadoop1.tgz I am able to connect to spark cluster but getting following error, when trying to access the file in S3 in my usecase. Code: sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "YOUR_KEY_ID") sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","YOUR_SEC_KEY") val file = "s3n://<bucket>/<key>" val data = sc.textFile(file) data.count file: String = s3n://<bucket>/<key> data: org.apache.spark.rdd.RDD[String] = s3n://<bucket>/<key> MappedRDD[1] at textFile at

How can I pretty print a data frame in Zeppelin/Spark/Scala?

阅读更多关于 How can I pretty print a data frame in Zeppelin/Spark/Scala?

I am using Spark 2 and Scala 2.11 in a Zeppelin 0.7 notebook. I have a dataframe that I can print like this: dfLemma.select("text", "lemma").show(20,false) and the output looks like: +---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |text |lemma | +-----------------------------------------------------------------------------------------------------------

How can I pretty print a data frame in Zeppelin/Spark/Scala?

阅读更多关于 How can I pretty print a data frame in Zeppelin/Spark/Scala?

问题 I am using Spark 2 and Scala 2.11 in a Zeppelin 0.7 notebook. I have a dataframe that I can print like this: dfLemma.select("text", "lemma").show(20,false) and the output looks like: +---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |text |lemma | +----

Is it possible to customize the skin on Zeppelin?

阅读更多关于 Is it possible to customize the skin on Zeppelin?

Is it possible to customize the skin on Zeppelin? In otherwords, replace the Zeppelin logo with something else? Yes, it is possible very much. As you know Apache Zeppelin (incubating) is an open source project, so need just to: clone it from github.com/apache/incubator-zeppelin make modifications inside zeppelin-web sub-module it is a standard Angular web-application, so you can change anything build it That is basically it. There are at least 2 companies who are known to successfully follow these steps. As already mentioned in bzz's answer: it is possible to customise the UI of zeppelin. Here

Using d3.js with Apache Zeppelin

阅读更多关于 Using d3.js with Apache Zeppelin

I'm trying to add more visualization options to Apache Zeppelin by integrating it with d3.js I found an example where someone did it with leaflet.js here , and tried to do something similar -- unfortunately I'm not too familiar with angularJS (what Zeppelin uses to interpret front end languages). I'm also not streaming data. Below is my code, using just a simple tutorial example from d3.js %angular <div> <svg class="chart"></svg> </div> <script> function useD3() { var data = [4, 8, 15, 16, 23, 42]; var width = 420, barHeight = 20; var x = d3.scale.linear() .domain([0, d3.max(data)]) .range([0,

How to get the output from console streaming sink in Zeppelin?

阅读更多关于 How to get the output from console streaming sink in Zeppelin?

I'm struggling to get the console sink working with PySpark Structured Streaming when run from Zeppelin. Basically, I'm not seeing any results printed to the screen, or to any logfiles I've found. My question: Does anyone have a working example of using PySpark Structured Streaming with a sink that produces output visible in Apache Zeppelin? Ideally it would also use the socket source, as that's easy to test with. I'm using: Ubuntu 16.04 spark-2.2.0-bin-hadoop2.7 zeppelin-0.7.3-bin-all Python3 I've based my code on the structured_network_wordcount.py example . It works when run from the

Register UDF to SqlContext from Scala to use in PySpark

阅读更多关于 Register UDF to SqlContext from Scala to use in PySpark

问题 Is it possible to register a UDF (or function) written in Scala to use in PySpark ? E.g.: val mytable = sc.parallelize(1 to 2).toDF("spam") mytable.registerTempTable("mytable") def addOne(m: Integer): Integer = m + 1 // Spam: 1, 2 In Scala, the following is now possible: val UDFaddOne = sqlContext.udf.register("UDFaddOne", addOne _) val mybiggertable = mytable.withColumn("moreSpam", UDFaddOne(mytable("spam"))) // Spam: 1, 2 // moreSpam: 2, 3 I would like to use "UDFaddOne" in PySpark like