Task not serializable while using custom dataframe class in Spark Scala

微笑、不失礼 提交于 2019-12-04 21:06:29

It looks like spark tries to serialize all the scope around testList. Try to inline data @inline val testList = List[String]("a", "b") or use different object where you store function/data which you pass to drivers.

Just add 'extends Serializable' This work for me

/**
   * A wrapper around ProducerRecord RDD that allows to save RDD to Kafka.
   *
   * KafkaProducer is shared within all threads in one executor.
   * Error handling strategy - remember "last" seen exception and rethrow it to allow task fail.
   */
 implicit class DatasetKafkaSink(ds: Dataset[ProducerRecord[String, GenericRecord]]) extends Serializable {

   class ExceptionRegisteringCallback extends Callback {
     private[this] val lastRegisteredException = new AtomicReference[Option[Exception]](None)

     override def onCompletion(metadata: RecordMetadata, exception: Exception): Unit = {
       Option(exception) match {
         case a @ Some(_) => lastRegisteredException.set(a) // (re)-register exception if send failed
         case _ => // do nothing if encountered successful send
       }
     }

     def rethrowException(): Unit = lastRegisteredException.getAndSet(None).foreach(e => throw e)
   }

   /**
     * Save to Kafka reusing KafkaProducer from singleton holder.
     * Returns back control only once all records were actually sent to Kafka, in case of error rethrows "last" seen
     * exception in the same thread to allow Spark task to fail
     */
   def saveToKafka(kafkaProducerConfigs: Map[String, AnyRef]): Unit = {
     ds.foreachPartition { records =>
       val callback = new ExceptionRegisteringCallback
       val producer = KafkaProducerHolder.getInstance(kafkaProducerConfigs)

       records.foreach(record => producer.send(record, callback))

       producer.flush()
       callback.rethrowException()
     }
   }
 }'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!