In our spark-streaming job we read messages in streaming from kafka.
For this, we use the KafkaUtils.createDirectStream
API which returns JavaPai
Use one of the versions of createDirectStream
that takes a messageHandler
function as a parameter. Here's what I do:
val messages = KafkaUtils.createDirectStream[Array[Byte], Array[Byte], DefaultDecoder, DefaultDecoder, (String, Array[Byte]](
ssc,
kafkaParams,
getPartitionsAndOffsets(topics).map(t => (t._1, t._2._1).toMap,
(msg: MessageAndMetadata[Array[Byte],Array[Byte]]) => { (msg.topic, msg.message)}
)
There's stuff there that doesn't mean anything to you -- the relevant part is
(msg: MessageAndMetadata[Array[Byte],Array[Byte]]) => { (msg.topic, msg.message)}
If you are not familiar with Scala
, all the function does is return a Tuple2
containing msg.topic
and msg.message
. Your function needs to return both of these in order for you to use them downstream. You could just return the entire MessageAndMetadata
object instead, which gives you a couple of other interesting fields. But if you only wanted the topic
and the message
, then use the above.
At the bottom of the Kafka integration guide, there's an example which extracts the topic from the messages.
The relevant code in Java:
// Hold a reference to the current offset ranges, so it can be used downstream
final AtomicReference<OffsetRange[]> offsetRanges = new AtomicReference<>();
directKafkaStream.transformToPair(
new Function<JavaPairRDD<String, String>, JavaPairRDD<String, String>>() {
@Override
public JavaPairRDD<String, String> call(JavaPairRDD<String, String> rdd) throws Exception {
OffsetRange[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges();
offsetRanges.set(offsets);
return rdd;
}
}
).map(
...
).foreachRDD(
new Function<JavaPairRDD<String, String>, Void>() {
@Override
public Void call(JavaPairRDD<String, String> rdd) throws IOException {
for (OffsetRange o : offsetRanges.get()) {
System.out.println(
o.topic() + " " + o.partition() + " " + o.fromOffset() + " " + o.untilOffset()
);
}
...
return null;
}
}
);
This can probably be collapsed into something more compact which just asks for the topic and nothing else.