apache-spark-encoders

Spark Error: Unable to find encoder for type stored in a Dataset

China☆狼群 提交于 2021-01-27 07:50:22
问题 I am using Spark on a Zeppelin notebook, and groupByKey() does not seem to be working. This code: df.groupByKey(row => row.getLong(0)) .mapGroups((key, iterable) => println(key)) Gives me this error (presumably a compilation error, since it shows up in no time while the dataset I am working on is pretty big): error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for

Spark Error: Unable to find encoder for type stored in a Dataset

霸气de小男生 提交于 2021-01-27 07:50:16
问题 I am using Spark on a Zeppelin notebook, and groupByKey() does not seem to be working. This code: df.groupByKey(row => row.getLong(0)) .mapGroups((key, iterable) => println(key)) Gives me this error (presumably a compilation error, since it shows up in no time while the dataset I am working on is pretty big): error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for

Why is “Unable to find encoder for type stored in a Dataset” when creating a dataset of custom case class?

断了今生、忘了曾经 提交于 2020-01-08 12:23:52
问题 Spark 2.0 (final) with Scala 2.11.8. The following super simple code yields the compilation error Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. import org.apache.spark.sql.SparkSession case class SimpleTuple(id: Int, desc: String) object DatasetTest { val dataList = List( SimpleTuple(5, "abc

Convert scala list to DataFrame or DataSet

夙愿已清 提交于 2020-01-02 03:00:15
问题 I am new to Scala. I am trying to convert a scala list (which is holding the results of some calculated data on a source DataFrame) to Dataframe or Dataset. I am not finding any direct method to do that. However, I have tried the following process to convert my list to DataSet but it seems not working. I am providing the 3 situations below. Can someone please provide me some ray of hope, how to do this conversion? Thanks. import org.apache.spark.sql.{DataFrame, Row, SQLContext,

How to create a custom Encoder in Spark 2.X Datasets?

筅森魡賤 提交于 2019-12-31 12:21:05
问题 Spark Datasets move away from Row's to Encoder 's for Pojo's/primitives. The Catalyst engine uses an ExpressionEncoder to convert columns in a SQL expression. However there do not appear to be other subclasses of Encoder available to use as a template for our own implementations. Here is an example of code that is happy in Spark 1.X / DataFrames that does not compile in the new regime: //mapping each row to RDD tuple df.map(row => { var id: String = if (!has_id) "" else row.getAs[String]("id"

How to convert a dataframe to dataset in Apache Spark in Scala?

左心房为你撑大大i 提交于 2019-12-30 04:01:06
问题 I need to convert my dataframe to a dataset and I used the following code: val final_df = Dataframe.withColumn( "features", toVec4( // casting into Timestamp to parse the string, and then into Int $"time_stamp_0".cast(TimestampType).cast(IntegerType), $"count", $"sender_ip_1", $"receiver_ip_2" ) ).withColumn("label", (Dataframe("count"))).select("features", "label") final_df.show() val trainingTest = final_df.randomSplit(Array(0.3, 0.7)) val TrainingDF = trainingTest(0) val TestingDF

Spark Encoders: when to use beans()

巧了我就是萌 提交于 2019-12-24 11:16:13
问题 I came across a memory management problem while using Spark's caching mechanism. I am currently utilizing Encoder s with Kryo and was wondering if switching to beans would help me reduce the size of my cached dataset. Basically, what are the pros and cons of using beans over Kryo serialization when working with Encoder s? Are there any performance improvements? Is there a way to compress a cached Dataset apart from caching with SER option? For the record, I have found a similar topic that

How to pass Encoder as parameter to dataframe's as method

只谈情不闲聊 提交于 2019-12-24 09:48:12
问题 I want to convert dataFrame to dataSet by using different case class. Now, my code is like below. case Class Views(views: Double) case Class Clicks(clicks: Double) def convertViewsDFtoDS(df: DataFrame){ df.as[Views] } def convertClicksDFtoDS(df: DataFrame){ df.as[Clicks] } So, my question is "Is there anyway I can use one general function to this by pass case class as extra parameter to this function?" 回答1: It seems a bit obsolete ( as method does exactly what you want) but you can import org

Encoder for Row Type Spark Datasets

戏子无情 提交于 2019-12-20 08:36:53
问题 I would like to write an encoder for a Row type in DataSet, for a map operation that I am doing. Essentially, I do not understand how to write encoders. Below is an example of a map operation: In the example below, instead of returning Dataset<String>, I would like to return Dataset<Row> Dataset<String> output = dataset1.flatMap(new FlatMapFunction<Row, String>() { @Override public Iterator<String> call(Row row) throws Exception { ArrayList<String> obj = //some map operation return obj

Why is the error “Unable to find encoder for type stored in a Dataset” when encoding JSON using case classes?

丶灬走出姿态 提交于 2019-12-18 03:14:06
问题 I've written spark job: object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application").setMaster("local") val sc = new SparkContext(conf) val ctx = new org.apache.spark.sql.SQLContext(sc) import ctx.implicits._ case class Person(age: Long, city: String, id: String, lname: String, name: String, sex: String) case class Person2(name: String, age: Long, city: String) val persons = ctx.read.json("/tmp/persons.json").as[Person] persons.printSchema() }