apache-kafka-streams

Failed to delete the state directory in IDE for Kafka Stream Application

五迷三道 提交于 2019-12-01 05:21:52
I am developing a simple Kafka Stream application which extracting messages from a topic and put it into another topic after transformation. I am using Intelij for my development. When I debug/run this application, it works perfect if my IDE and the Kafka Server sitting in the SAME machine (i.e. with the BOOTSTRAP_SERVERS_CONFIG = localhost:9092 and SCHEMA_REGISTRY_URL_CONFIG = localhost:8081) However, when I try to use another machine to do the development (i.e. with the BOOTSTRAP_SERVERS_CONFIG = XXX.XXX.XXX:9092 and SCHEMA_REGISTRY_URL_CONFIG = XXX.XXX.XXX:8081 where XXX.XXX.XXX is the ip

How to output result of windowed aggregation only when window is finished? [duplicate]

百般思念 提交于 2019-12-01 05:21:33
This question already has an answer here: How to send final kafka-streams aggregation result of a time windowed KTable? 2 answers I have a KStream in which I want to count some dimension of the events. I do it as follows: KTable<Windowed<Long>, Counter> ret = input.groupByKey() .windowedBy(TimeWindows.of(Duration.of(10, SECONDS))) .aggregate(Counter::new, (k, v, c) -> new Counter(c.count + v.getDimension())); I want to have a new KStream with those aggregations as events. I can do it easily like this: ret.toStream().to("output"); The problem is that every event in "input" topic will produce an

Kafka Streams does not increment offset by 1 when producing to topic

帅比萌擦擦* 提交于 2019-12-01 05:18:36
问题 I have implemented a simple Kafka Dead letter record processor. It works perfectly when using records produced from the Console producer. However I find that our Kafka Streams applications do not guarantee that producing records to the sink topics that the offsets will be incremented by 1 for each record produced. Dead Letter Processor Background: I have a scenario where records may be received before all data required to process it is published. When records are not matched for processing by

Kafka Streams and RPC: is calling REST service in map() operator considered an anti-pattern?

浪子不回头ぞ 提交于 2019-12-01 04:27:57
The naive approach for implementing the use case of enriching an incoming stream of events stored in Kafka with reference data - is by calling in map() operator an external service REST API that provides this reference data, for each incoming event. eventStream.map((key, event) -> /* query the external service here, then return the enriched event */) Another approach is to have second events stream with reference data and store it in KTable that will be a lightweight embedded "database" then join main event stream with it. KStream<String, Object> eventStream = builder.stream(..., "event-topic"

Is it possible to access message headers with Kafka Streams?

∥☆過路亽.° 提交于 2019-12-01 04:17:00
With the addition of Headers to the records ( ProducerRecord & ConsumerRecord ) in Kafka 0.11, is it possible to get these headers when processing a topic with Kafka Streams? When calling methods like map on a KStream it provides arguments of the key and the value of the record but no way I can see to access the headers . It would be nice if we could just map over the ConsumerRecord s. ex. KStreamBuilder kStreamBuilder = new KStreamBuilder(); KStream<String, String> stream = kStreamBuilder.stream("some-topic"); stream .map((key, value) -> ... ) // can I get access to headers in methods like

How to manage Kafka KStream to Kstream windowed join?

家住魔仙堡 提交于 2019-12-01 04:09:53
Based on apache Kafka docs KStream-to-KStream Joins are always windowed joins , my question is how can I control the size of the window? Is it the same size for keeping the data on the topic? Or for example, we can keep data for 1 month but join the stream just for past week? Is there any good example to show a windowed KStream-to-kStream windowed join? In my case let's say I have 2 KStream, kstream1 and kstream2 I want to be able to join 10 days of kstream1 to 30 days of kstream2 . That is absolutely possible. When you define you Stream operator, you specify the join window size explicitly.

Failed to delete the state directory in IDE for Kafka Stream Application

a 夏天 提交于 2019-12-01 04:06:31
问题 I am developing a simple Kafka Stream application which extracting messages from a topic and put it into another topic after transformation. I am using Intelij for my development. When I debug/run this application, it works perfect if my IDE and the Kafka Server sitting in the SAME machine (i.e. with the BOOTSTRAP_SERVERS_CONFIG = localhost:9092 and SCHEMA_REGISTRY_URL_CONFIG = localhost:8081) However, when I try to use another machine to do the development (i.e. with the BOOTSTRAP_SERVERS

Is it a good practice to do sync database query or restful call in Kafka streams jobs?

柔情痞子 提交于 2019-12-01 01:47:44
I use Kafka streams to process real-time data, in the Kafka streams tasks, I need to access MySQL to query data, and need to call another restful service. All the operations are synchronous. I'm afraid the sync call will reduce the process capability of the streams tasks. Is this a good practice? or Is there any good idea to do this? A better way to do it would be to stream your MySQL table(s) into Kafka, and access the data there. This has the advantage of decoupling your streams app from the MySQL database. If you moved away from MySQL in the future, so long as the data were still written to

How to manage Kafka KStream to Kstream windowed join?

不羁的心 提交于 2019-12-01 00:29:09
问题 Based on apache Kafka docs KStream-to-KStream Joins are always windowed joins , my question is how can I control the size of the window? Is it the same size for keeping the data on the topic? Or for example, we can keep data for 1 month but join the stream just for past week? Is there any good example to show a windowed KStream-to-kStream windowed join? In my case let's say I have 2 KStream, kstream1 and kstream2 I want to be able to join 10 days of kstream1 to 30 days of kstream2 . 回答1: That

End-of-window outer join with KafkaStreams

泪湿孤枕 提交于 2019-11-30 22:11:45
I have a Kafka topic where I expect messages with two different key types: old and new. i.e. "1-new" , "1-old" , "2-new" , "2-old" . Keys are unique, but some might be missing. Now using Kotlin and KafkaStreams API I can log those messages with have same key id from new and old. val windows = JoinWindows.of(Duration.of(2, MINUTES).toMillis()) val newStream = stream.filter({ key, _ -> isNew(key) }) .map({key, value -> KeyValue(key.replace(NEW_PREFIX, ""), value) }) val oldStream = stream.filter({ key, _ -> isOld(key) }) .map({key, value -> KeyValue(key.replace(OLD_PREFIX, ""), value) }) val