SparkSQL Aggregator: MissingRequirementError

浪子不回头ぞ 提交于 2019-12-25 07:14:08

问题


I am trying to use Apache Spark's 2.0 Datasets:

import org.apache.spark.sql.expressions.Aggregator
import org.apache.spark.sql.Encoder
import spark.implicits._

case class C1(f1: String, f2: String, f3: String, f4: String, f5: Double)

val teams = Seq(
  C1("hash1", "NLC", "Cubs", "2016-01-23", 3253.21),
  C1("hash1", "NLC", "Cubs", "2014-01-23", 353.88),
  C1("hash3", "NLW", "Dodgers", "2013-08-15", 4322.12),
  C1("hash4", "NLE", "Red Sox", "2010-03-14", 10283.72)
).toDS()

val c1Agg = new Aggregator[C1, Seq[C1], Seq[C1]]  with Serializable {
  def zero: Seq[C1] = Seq.empty[C1] //Nil
  def reduce(b: Seq[C1], a: C1): Seq[C1] = b :+ a
  def merge(b1: Seq[C1], b2: Seq[C1]): Seq[C1] = b1 ++ b2
  def finish(r: Seq[C1]): Seq[C1] = r

  override def bufferEncoder: Encoder[Seq[C1]] = newProductSeqEncoder[C1]
  override def outputEncoder: Encoder[Seq[C1]] = newProductSeqEncoder[C1]
}.toColumn

val g_c1 = teams.groupByKey(_.f1).agg(c1Agg).collect

But then when I run it I got the following error message:

scala.reflect.internal.MissingRequirementError: class lineb4c2bb72bf6e417e9975d1a65602aec912.$read in JavaMirror with sun.misc.Launcher$AppClassLoader@14dad5dc of type class sun.misc.Launcher$AppClassLoader with class path [OMITTED] not found

I am assuming the configuration is correct because I am running under Databricks community cloud.


回答1:


I was finally able to make it work by using ExpressionEncoder() rather than newProductSeqEncoder[C1] in lines 20, 21.

(Not sure why the previous code did not work though.)



来源:https://stackoverflow.com/questions/38046898/sparksql-aggregator-missingrequirementerror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!