I\'ve been looking through a lot of the Kafka documentation for a java application that I am working on. I\'ve tried getting into the lambda syntax introduced in Java 8, but
If you use Kafka Streams, you need to apply functions/operators on your data streams. In your case, you create a KStream
object, thus, you want to apply an operator to source
.
Depending on what you want to do, there are operators that apply a function to each record in the stream independently (eg. map()
), or other operators that apply a function to multiple record together (eg. aggregateByKey()
). You should have a look into the documentation: http://docs.confluent.io/3.0.0/streams/developer-guide.html#kafka-streams-dsl and examples https://github.com/confluentinc/kafka-streams-examples
Thus, you never create local variables using Kafka Streams as you show in your example above, but rather embed everything in operators/functions that get chained together.
For example, if you want to print all input record to stdout, you could do
KStream source = builder.stream(stringSerde, stringSerde, "in-stream");
source.foreach(new ForeachAction() {
void apply(String key, String value) {
System.out.println(key + ": " + value);
}
});
Thus, after you start your application via streams.start()
, it will consumer the records from you input topic and for each record of your topic, a call to apply(...)
is done, which prints the record on stdout.
Of course, a more native way for printing the stream to the console would be to use source.print()
(which internally is basically the same as the shown foreach()
operator with an already given ForeachAction
.)
For your example with assigning the string to a local variable, you would need to put your code into apply(...)
and do your regex-stuff etc. there to "extract the 3 letter keywords".
The best way to express this, would however be via a combination of flatMapValues()
and print()
(ie, source.flatMapValues(...).print()
). flatMapValues()
is called for each input record (in your case, I assume key will be null
so you can ignore it). Within your flatMapValue
function, you apply your regex and for each match, you add the match to a list of values that you finally return.
source.flatMapValues(new ValueMapper>() {
@Override
public Iterable apply(String value) {
ArrayList keywords = new ArrayList();
// apply regex to value and for each match add it to keywords
return keywords;
}
}
The output of flatMapValues
will be a KStream
again, containing a record for each found keyword (ie, the output stream is a "union" over all lists your return in ValueMapper#apply()
). Finally, you just print your result to console via print()
.
(Of course, you could also use a single foreach
instead of flatMapValue
+print
but this would be less modular.)