Apache Spark 2.1 : java.lang.UnsupportedOperationException: No Encoder found for scala.collection.immutable.Set[String]

让人想犯罪 __ 提交于 2019-12-06 07:42:59

The problem here is that Spark does not provide an encoder for Set out-of-the-box (it does provide encoders for "primitives", Seqs, Arrays, and Products of other supported types).

You can either try using this excellent answer to create your own encoder for Set[String] (more accurately, an encoder for the type you're using, Traversable[((String, String), (String, Set[String]))], which contains a Set[String]), OR you can work-around this issue by using a Seq instead of a Set:

// ...
case Some(x: Traversable[(String, String)]) =>
  //println("In flatMap:" + x + " ~~&~~ " + text + " ~~&~~ " + storylines)
  namedEnts.map((_, (text, storylines.toSeq.distinct)))
// ...

(I'm using distinct to immitate the Set behavior; Can also try .toSet.toSeq)

UPDATE: per your comment re Spark 1.6.2 - the difference is that in 1.6.2, Dataset.flatMap returns an RDD and not a Dataset, therefore requires no encoding of the results returned from the function you supply; So, this indeed brings up another good workaround - you can easily simulate this behavior by explicitly switching to work with the RDD before the flatMap operation:

nameDF.select("namedEnts", "text", "storylines")
  .rdd
  .flatMap { /*...*/ } // use your function as-is, it can return Set[String]
  .aggregateByKey( /*...*/ )
  .map( /*...*/ )
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!