apache-spark-encoders

scala generic encoder for spark case class

对着背影说爱祢 提交于 2019-12-04 08:30:44
How can I get this method to compile. Strangely, sparks implicit are already imported. def loadDsFromHive[T <: Product](tableName: String, spark: SparkSession): Dataset[T] = { import spark.implicits._ spark.sql(s"SELECT * FROM $tableName").as[T] } This is the error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. [error] spark.sql(s"SELECT * FROM $tableName").as[T] According to the source code for org.apache

How to convert a dataframe to dataset in Apache Spark in Scala?

那年仲夏 提交于 2019-12-03 05:56:10
I need to convert my dataframe to a dataset and I used the following code: val final_df = Dataframe.withColumn( "features", toVec4( // casting into Timestamp to parse the string, and then into Int $"time_stamp_0".cast(TimestampType).cast(IntegerType), $"count", $"sender_ip_1", $"receiver_ip_2" ) ).withColumn("label", (Dataframe("count"))).select("features", "label") final_df.show() val trainingTest = final_df.randomSplit(Array(0.3, 0.7)) val TrainingDF = trainingTest(0) val TestingDF=trainingTest(1) TrainingDF.show() TestingDF.show() ///lets create our liner regression val lir= new

How to create a custom Encoder in Spark 2.X Datasets?

痞子三分冷 提交于 2019-12-02 22:38:47
Spark Datasets move away from Row's to Encoder 's for Pojo's/primitives. The Catalyst engine uses an ExpressionEncoder to convert columns in a SQL expression. However there do not appear to be other subclasses of Encoder available to use as a template for our own implementations. Here is an example of code that is happy in Spark 1.X / DataFrames that does not compile in the new regime: //mapping each row to RDD tuple df.map(row => { var id: String = if (!has_id) "" else row.getAs[String]("id") var label: String = row.getAs[String]("label") val channels : Int = if (!has_channels) 0 else row

Encoder for Row Type Spark Datasets

牧云@^-^@ 提交于 2019-12-02 16:18:44
I would like to write an encoder for a Row type in DataSet, for a map operation that I am doing. Essentially, I do not understand how to write encoders. Below is an example of a map operation: In the example below, instead of returning Dataset<String>, I would like to return Dataset<Row> Dataset<String> output = dataset1.flatMap(new FlatMapFunction<Row, String>() { @Override public Iterator<String> call(Row row) throws Exception { ArrayList<String> obj = //some map operation return obj.iterator(); } },Encoders.STRING()); I understand that instead of a string Encoder needs to be written as

Spark Dataset : Example : Unable to generate an encoder issue

[亡魂溺海] 提交于 2019-11-30 08:33:16
问题 New to spark world and trying a dataset example written in scala that I found online On running it through SBT , i keep on getting the following error org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class Any idea what am i overlooking Also feel free to point out better way of writing the same dataset example Thanks > sbt> runMain DatasetExample Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/10/25 01:06:39 INFO Remoting:

Spark Dataset : Example : Unable to generate an encoder issue

天涯浪子 提交于 2019-11-29 07:09:24
New to spark world and trying a dataset example written in scala that I found online On running it through SBT , i keep on getting the following error org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class Any idea what am i overlooking Also feel free to point out better way of writing the same dataset example Thanks > sbt> runMain DatasetExample Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/10/25 01:06:39 INFO Remoting: Starting remoting 16/10/25 01:06:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp:/

Why is the error “Unable to find encoder for type stored in a Dataset” when encoding JSON using case classes?

懵懂的女人 提交于 2019-11-29 01:35:00
I've written spark job: object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application").setMaster("local") val sc = new SparkContext(conf) val ctx = new org.apache.spark.sql.SQLContext(sc) import ctx.implicits._ case class Person(age: Long, city: String, id: String, lname: String, name: String, sex: String) case class Person2(name: String, age: Long, city: String) val persons = ctx.read.json("/tmp/persons.json").as[Person] persons.printSchema() } } In IDE when I run the main function, 2 error occurs: Error:(15, 67) Unable to find encoder for type

How to create a Dataset of Maps?

心已入冬 提交于 2019-11-27 14:51:51
I'm using Spark 2.2 and am running into troubles when attempting to call spark.createDataset on a Seq of Map . Code and output from my Spark Shell session follow: // createDataSet on Seq[T] where T = Int works scala> spark.createDataset(Seq(1, 2, 3)).collect res0: Array[Int] = Array(1, 2, 3) scala> spark.createDataset(Seq(Map(1 -> 2))).collect <console>:24: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future

Encode an ADT / sealed trait hierarchy into Spark DataSet column

我的梦境 提交于 2019-11-27 09:29:10
If I want to store an Algebraic Data Type (ADT) (ie a Scala sealed trait hierarchy) within a Spark DataSet column, what is the best encoding strategy? For example, if I have an ADT where the leaf types store different kinds of data: sealed trait Occupation case object SoftwareEngineer extends Occupation case class Wizard(level: Int) extends Occupation case class Other(description: String) extends Occupation Whats the best way to construct a: org.apache.spark.sql.DataSet[Occupation] TL;DR There is no good solution right now, and given Spark SQL / Dataset implementation, it is unlikely there

How to create a Dataset of Maps?

为君一笑 提交于 2019-11-26 16:56:16
问题 I'm using Spark 2.2 and am running into troubles when attempting to call spark.createDataset on a Seq of Map . Code and output from my Spark Shell session follow: // createDataSet on Seq[T] where T = Int works scala> spark.createDataset(Seq(1, 2, 3)).collect res0: Array[Int] = Array(1, 2, 3) scala> spark.createDataset(Seq(Map(1 -> 2))).collect <console>:24: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are