SortedMap non serializable error in Spark Dataset

我与影子孤独终老i 提交于 2019-12-24 10:47:34

问题


It seems like scala.collection.SortedMap is not serializable?

Simple code example:

case class MyClass(s: scala.collection.SortedMap[String, String] = SortedMap[String, String]())

object MyClass {
  def apply(i: Int): MyClass = MyClass()
}

import sparkSession.implicits._

List(MyClass(1), MyClass()).toDS().show(2)

Will return:

+-----+
|    s|
+-----+
|Map()|
|Map()|
+-----+

On the other hand, take() will fail miserably at execution time:

List(MyClass(1), MyClass()).toDS().take(2)

ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 116, Column 100: No applicable constructor/method found for actual parameters "scala.collection.Map"; candidates are: "com.caspida.algorithms.security.offline.exfiltrationthreat.MyClass(scala.collection.SortedMap)"


回答1:


The supported Scala types for Spark (as of 2.1.0) do not include scala.collection.SortedMap). A list of supported types can be found here:

https://spark.apache.org/docs/latest/sql-programming-guide.html#data-types

As the link suggest, the supported type for Maps is scala.collection.Map so the following works:

case class MyClass(s: scala.collection.Map[String, String] = SortedMap[String, String]())


scala> spark.createDataset( MyClass() :: Nil ).collect()
res: Array[MyClass2] = Array(MyClass(Map()))


来源:https://stackoverflow.com/questions/42964531/sortedmap-non-serializable-error-in-spark-dataset

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!