get topic from kafka message in spark

前端 未结 2 1795
难免孤独
难免孤独 2020-12-20 22:38

In our spark-streaming job we read messages in streaming from kafka.

For this, we use the KafkaUtils.createDirectStream API which returns JavaPai

相关标签:
2条回答
  • 2020-12-20 22:49

    Use one of the versions of createDirectStream that takes a messageHandler function as a parameter. Here's what I do:

    val messages = KafkaUtils.createDirectStream[Array[Byte], Array[Byte], DefaultDecoder, DefaultDecoder, (String, Array[Byte]](
      ssc,
      kafkaParams,
      getPartitionsAndOffsets(topics).map(t => (t._1, t._2._1).toMap,
      (msg: MessageAndMetadata[Array[Byte],Array[Byte]]) => { (msg.topic, msg.message)}
    )
    

    There's stuff there that doesn't mean anything to you -- the relevant part is

    (msg: MessageAndMetadata[Array[Byte],Array[Byte]]) => { (msg.topic, msg.message)}
    

    If you are not familiar with Scala, all the function does is return a Tuple2 containing msg.topic and msg.message. Your function needs to return both of these in order for you to use them downstream. You could just return the entire MessageAndMetadata object instead, which gives you a couple of other interesting fields. But if you only wanted the topic and the message, then use the above.

    0 讨论(0)
  • 2020-12-20 23:08

    At the bottom of the Kafka integration guide, there's an example which extracts the topic from the messages.

    The relevant code in Java:

     // Hold a reference to the current offset ranges, so it can be used downstream
     final AtomicReference<OffsetRange[]> offsetRanges = new AtomicReference<>();
    
     directKafkaStream.transformToPair(
       new Function<JavaPairRDD<String, String>, JavaPairRDD<String, String>>() {
         @Override
         public JavaPairRDD<String, String> call(JavaPairRDD<String, String> rdd) throws Exception {
           OffsetRange[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges();
           offsetRanges.set(offsets);
           return rdd;
         }
       }
     ).map(
       ...
     ).foreachRDD(
       new Function<JavaPairRDD<String, String>, Void>() {
         @Override
         public Void call(JavaPairRDD<String, String> rdd) throws IOException {
           for (OffsetRange o : offsetRanges.get()) {
             System.out.println(
               o.topic() + " " + o.partition() + " " + o.fromOffset() + " " + o.untilOffset()
             );
           }
           ...
           return null;
         }
       }
     );
    

    This can probably be collapsed into something more compact which just asks for the topic and nothing else.

    0 讨论(0)
提交回复
热议问题