Is it possible to obtain specific message offset in Kafka+SparkStreaming?

旧街凉风 提交于 2019-12-05 06:45:08

Yes, you can use MessageAndMetadata version of createDirectStream which allows you to access message metadata.

You can find example here which returns Dstream of tuple3.

val ssc = new StreamingContext(sparkConf, Seconds(10))

val kafkaParams = Map[String, String]("metadata.broker.list" -> (kafkaBroker))
var fromOffsets = Map[TopicAndPartition, Long]()
val topicAndPartition: TopicAndPartition = new TopicAndPartition(kafkaTopic.trim, 0)
val topicAndPartition1: TopicAndPartition = new TopicAndPartition(kafkaTopic1.trim, 0)
fromOffsets += (topicAndPartition -> inputOffset)
fromOffsets += (topicAndPartition1 -> inputOffset1)

val messagesDStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder, Tuple3[String, Long, String]](ssc, kafkaParams, fromOffsets, (mmd: MessageAndMetadata[String, String]) => {
    (mmd.topic ,mmd.offset, mmd.message().toString)
  })

In above example tuple3._1 will have topic, tuple3._2 will have offset and tuple3._3 will have message.

Hope this helps!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!