get topic from kafka message in spark

前端 未结 2 1798
难免孤独
难免孤独 2020-12-20 22:38

In our spark-streaming job we read messages in streaming from kafka.

For this, we use the KafkaUtils.createDirectStream API which returns JavaPai

2条回答
  •  死守一世寂寞
    2020-12-20 23:08

    At the bottom of the Kafka integration guide, there's an example which extracts the topic from the messages.

    The relevant code in Java:

     // Hold a reference to the current offset ranges, so it can be used downstream
     final AtomicReference offsetRanges = new AtomicReference<>();
    
     directKafkaStream.transformToPair(
       new Function, JavaPairRDD>() {
         @Override
         public JavaPairRDD call(JavaPairRDD rdd) throws Exception {
           OffsetRange[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges();
           offsetRanges.set(offsets);
           return rdd;
         }
       }
     ).map(
       ...
     ).foreachRDD(
       new Function, Void>() {
         @Override
         public Void call(JavaPairRDD rdd) throws IOException {
           for (OffsetRange o : offsetRanges.get()) {
             System.out.println(
               o.topic() + " " + o.partition() + " " + o.fromOffset() + " " + o.untilOffset()
             );
           }
           ...
           return null;
         }
       }
     );
    

    This can probably be collapsed into something more compact which just asks for the topic and nothing else.

提交回复
热议问题