apache-spark-encoders

Encode an ADT / sealed trait hierarchy into Spark DataSet column

懵懂的女人 提交于 2019-11-26 14:46:04
问题 If I want to store an Algebraic Data Type (ADT) (ie a Scala sealed trait hierarchy) within a Spark DataSet column, what is the best encoding strategy? For example, if I have an ADT where the leaf types store different kinds of data: sealed trait Occupation case object SoftwareEngineer extends Occupation case class Wizard(level: Int) extends Occupation case class Other(description: String) extends Occupation Whats the best way to construct a: org.apache.spark.sql.DataSet[Occupation] 回答1: TL;DR

Why is “Unable to find encoder for type stored in a Dataset” when creating a dataset of custom case class?

风流意气都作罢 提交于 2019-11-26 14:34:11
Spark 2.0 (final) with Scala 2.11.8. The following super simple code yields the compilation error Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. import org.apache.spark.sql.SparkSession case class SimpleTuple(id: Int, desc: String) object DatasetTest { val dataList = List( SimpleTuple(5, "abc"), SimpleTuple(6, "bcd") ) def main(args: Array[String]): Unit = { val sparkSession = SparkSession

How to store custom objects in Dataset?

丶灬走出姿态 提交于 2019-11-25 22:30:49
问题 According to Introducing Spark Datasets: As we look forward to Spark 2.0, we plan some exciting improvements to Datasets, specifically: ... Custom encoders – while we currently autogenerate encoders for a wide variety of types, we’d like to open up an API for custom objects. and attempts to store custom type in a Dataset lead to following error like: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by

Encoder error while trying to map dataframe row to updated row

扶醉桌前 提交于 2019-11-25 21:43:52
When I m trying to do the same thing in my code as mentioned below dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) I have taken the above reference from here: Scala: How can I replace value in Dataframs using scala But I am getting encoder error as Unable to find encoder for type stored in a Dataset. Primitive types (Int, S tring, etc) and Product types (case classes) are supported by importing spark.im plicits._ Support for serializing other types will be added in future releases. Note: I am using