apache-zeppelin | 易学教程

Scala and Spark UDF function

阅读更多关于 Scala and Spark UDF function

问题 I made a simple UDF to convert or extract some values from a time field in a temptabl in spark. I register the function but when I call the function using sql it throws a NullPointerException. Below is my function and process of executing it. I am using Zeppelin. Strangly this was working yesterday but it stopped working this morning. Function def convert( time:String ) : String = { val sdf = new java.text.SimpleDateFormat("HH:mm") val time1 = sdf.parse(time) return sdf.format(time1) }

Issues Installing Zeppelin on CentOS 6 with Vagrant

阅读更多关于 Issues Installing Zeppelin on CentOS 6 with Vagrant

问题 We are trying to stand up a sandbox/evaluation instance of Zeppelin on a 4-node CentOS 6 cluster with Vagrant and having some issues with dependencies in the build process. Here is the high level script we’re running. (Have tried running this as privileged account and as a user, with the same results.) Recreate Steps Install Hadoop 2.7.0 from Binary Install Spark 1.4.0 from Binary Install Maven 3.3.3 from Binary Run the following: curl --silent --location https://rpm.nodesource.com/setup |

Zeppelin with Spark interpreter ignores imports declared outside of class/function definition

阅读更多关于 Zeppelin with Spark interpreter ignores imports declared outside of class/function definition

问题 I'm trying to use some Scala code in Zeppelin 0.8.0 with Spark interpreter: %spark import scala.beans.BeanProperty class Node(@BeanProperty val parent: Option[Node]) { } But imports do not seem to be taken into account import scala.beans.BeanProperty <console>:14: error: not found: type BeanProperty @BeanProperty val parent: Option[Node]) { ^ EDIT: I found out that the following code works : class Node(@scala.beans.BeanProperty val parent: Option[Node]) { } This also works fine : def loadCsv

How can I pretty print a wrappedarray in Zeppelin/Spark/Scala?

阅读更多关于 How can I pretty print a wrappedarray in Zeppelin/Spark/Scala?

In this question I was told how to print a dataframe using zeppelin's z.show command. This works well except for 'WrappedArray' appearing in the lemma column: I have tried this: z.show(dfLemma.select(concat_ws(",", $"lemma"))) but it just gave me a list of words, not nicely formatted and I also want the racist column in my output. Any help is much appreciated. Here's a suggestion for formatting your array column: import org.apache.spark.sql.Column import org.apache.spark.sql.functions._ import sqlContext.implicits._ val df = Seq( (1, Array("An", "Array")), (2, Array("Another", "Array")) ).toDF

Connect Apache Zeppelin to Hive

阅读更多关于 Connect Apache Zeppelin to Hive

问题 I try to connect my apache zeppelin with my hive metastore. I use zeppelin 0.7.3 so there is not a hive interpreter only jdbc. I have copied my hive-site.xml to zeppelin conf folder but I don't know how to create a new hive interpreter. I also tried to access hive tables through spark's hive context but when I try this way, I can not see my hive databases only a default database is shown. Can someone explain either how to create a hive interpreter or how to access my hive metastore through

Apache zeppelin build process failure in zeppelin-web with bower

阅读更多关于 Apache zeppelin build process failure in zeppelin-web with bower

问题 I am trying to build zeppelin locally with windows and babun/cygwin. This site got me headed in the right direction, but I run into the following error when the build gets to Web Application: [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.23:bower (bower install) on project zeppelin-web: Failed to run task: 'bower --allow-root install' failed. (error code 8) -> [Help 1] I can go into the zeppelin-web directory and run bower install successfully, but I'm not sure

Apache Zeppelin installation grunt build error

阅读更多关于 Apache Zeppelin installation grunt build error

问题 My configuration is as follows: Ubuntu 15.04 Java 1.7 Spark 1.4.1 Hadoop 2.7 Maven 3.3.3 I am trying to install Apache Zeppelin after successfully cloning it from github and using the following command mvn clean package -DskipTests Despite several attempts, I am getting the following error after some initial success: [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.23:grunt (grunt build) on project zeppelin-web: Failed to run task: 'grunt --no-color' failed. (error

Spark 1.6: filtering DataFrames generated by describe()

阅读更多关于 Spark 1.6: filtering DataFrames generated by describe()

问题 The problem arises when I call describe function on a DataFrame: val statsDF = myDataFrame.describe() Calling describe function yields the following output: statsDF: org.apache.spark.sql.DataFrame = [summary: string, count: string] I can show statsDF normally by calling statsDF.show() +-------+------------------+ |summary| count| +-------+------------------+ | count| 53173| | mean|104.76128862392568| | stddev|3577.8184333911513| | min| 1| | max| 558407| +-------+------------------+ I would

Field “features” does not exist. SparkML

阅读更多关于 Field “features” does not exist. SparkML

问题 I am trying to build a model in Spark ML with Zeppelin. I am new to this area and would like some help. I think i need to set the correct datatypes to the column and set the first column as the label. Any help would be greatly appreciated, thank you val training = sc.textFile("hdfs:///ford/fordTrain.csv") val header = training.first val inferSchema = true val df = training.toDF val lr = new LogisticRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) val lrModel = lr.fit(df

apache zeppelin is started but there is connection error in localhost:8080

阅读更多关于 apache zeppelin is started but there is connection error in localhost:8080

问题 after successfully build apache zepellin on Ubuntu 14, I start zeppelin and it says successfully started but when I go to localhost:8080 Firefox shows unable to connect error like it didn't started but when I check Zeppelin status from terminal it says running and also I just copied config files templates so the config files are the default update changed the port to 8090 , here is the config file , but no change in result <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href=