In our spark-streaming job we read messages in streaming from kafka.
For this, we use the KafkaUtils.createDirectStream API which returns JavaPai
At the bottom of the Kafka integration guide, there's an example which extracts the topic from the messages.
The relevant code in Java:
// Hold a reference to the current offset ranges, so it can be used downstream
final AtomicReference offsetRanges = new AtomicReference<>();
directKafkaStream.transformToPair(
new Function, JavaPairRDD>() {
@Override
public JavaPairRDD call(JavaPairRDD rdd) throws Exception {
OffsetRange[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges();
offsetRanges.set(offsets);
return rdd;
}
}
).map(
...
).foreachRDD(
new Function, Void>() {
@Override
public Void call(JavaPairRDD rdd) throws IOException {
for (OffsetRange o : offsetRanges.get()) {
System.out.println(
o.topic() + " " + o.partition() + " " + o.fromOffset() + " " + o.untilOffset()
);
}
...
return null;
}
}
);
This can probably be collapsed into something more compact which just asks for the topic and nothing else.