I come up with the exception:
ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Task not serializable org
i not recomend the answeor of Yuval Itzchakov beacause you are open and close a lot of socket even open a connection in the broker with kafka is heavy and slow so I Strongly recommend to read this blog https://allegro.tech/2015/08/spark-kafka-integration.html i used it and test it and is the best option also I've put in productive environment.
KafkaProducer isn't serializable. You'll need to move the creation of the instance to inside foreachPartition:
requestSet.foreachPartition((partitions: Iterator[String]) => {
val producer: KafkaProducer[String, String] = new KafkaProducer[String, String](props)
partitions.foreach((line: String) => {
try {
producer.send(new ProducerRecord[String, String]("testtopic", line))
} catch {
case ex: Exception => {
log.warn(ex.getMessage, ex)
}
}
})
})
Note that KafkaProducer.send returns a Future[RecordMetadata], and the only exception that can propagate from it is SerializationException if the key or value can't be serialized.