spark kafka producer serializable

前端 未结 2 678
离开以前
离开以前 2020-12-18 06:33

I come up with the exception:

ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Task not serializable org

相关标签:
2条回答
  • 2020-12-18 06:53

    i not recomend the answeor of Yuval Itzchakov beacause you are open and close a lot of socket even open a connection in the broker with kafka is heavy and slow so I Strongly recommend to read this blog https://allegro.tech/2015/08/spark-kafka-integration.html i used it and test it and is the best option also I've put in productive environment.

    0 讨论(0)
  • 2020-12-18 06:59

    KafkaProducer isn't serializable. You'll need to move the creation of the instance to inside foreachPartition:

    requestSet.foreachPartition((partitions: Iterator[String]) => {
      val producer: KafkaProducer[String, String] = new KafkaProducer[String, String](props)
      partitions.foreach((line: String) => {
        try {
          producer.send(new ProducerRecord[String, String]("testtopic", line))
        } catch {
          case ex: Exception => {
            log.warn(ex.getMessage, ex)
          }
        }
      })
    })
    

    Note that KafkaProducer.send returns a Future[RecordMetadata], and the only exception that can propagate from it is SerializationException if the key or value can't be serialized.

    0 讨论(0)
提交回复
热议问题