How to fix “java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord” in Spark Streaming Kafka Consumer?

后端未结

关注

 3  878

挽巷

Spark 2.0.0
Apache Kafka 0.10.1.0
scala 2.11.8

When I use spark streaming and kafka integration with kafka broker version 0.10.1

相关标签:

3条回答

悲&欢浪女

2020-12-10 04:12
KafkaUtils.createDirectStream creates as a org.apache.spark.streaming.dstream.DStream. It is not a RDD. Spark Streaming will create RDDs temporarily as is runs. To retrieve an RDD use stream.foreach() to get the RDD and then RDD.foreach to get each object in the RDD. Those will be Kafka ConsumerRecords of which you use use the value() method to read the message from the Kafka topic:
```
stream.foreachRDD { rdd => 
    rdd.foreach { record => 
    val value = record.value()
    println(map.get(value)) 
    }
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
眼角桃花

2020-12-10 04:14
ConsumerRecord does not implement serialization, when performing operations that require serialization, ie persist or window, print. You need to add the below config to avoid the error.
```
    sparkConf.set("spark.serializer","org.apache.spark.serializer.KryoSerialize");
    sparkConf.registerKryoClasses((Class<ConsumerRecord>[] )Arrays.asList(ConsumerRecord.class).toArray());
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-12-10 04:25
The Consumer record object is received from Dstream. When you try to print it, it gives error because that object is not serailizable. Instead you should get values from ConsumerRecord object and print it.

instead of stream.print(), do:
```
stream.map(record=>(record.value().toString)).print
```
This should solve your problem.

GOTCHA

For anyone else seeing this exception, any call to checkpoint will call a persist with storageLevel = MEMORY_ONLY_SER, so don't call checkpoint until you call map
0 讨论(0)
发布评论:

提交评论
- 加载中...