apache-kafka-streams

Kafka - This server is not the leader for that topic-partition

谁说我不能喝 提交于 2019-12-04 21:11:56
问题 I have two broker kafka 0.10.2.0 cluster.Replication factor is 2. I am running 1.0.0 kafka stream application against this Kafka. In my kafka stream application, producer config has retries = 10 and retry.backoff.ms = 100 After running few minutes, I observed following logs in Kakfa server.log. Due to this Kafka stream application is throwing 'NOT_LEADER_FOR_PARTITION' exception. What may be the possible reason? Please help me. [2017-12-12 10:26:02,583] ERROR [ReplicaFetcherThread-0-1], Error

Kafka Streams - reducing the memory footprint for large state stores

╄→尐↘猪︶ㄣ 提交于 2019-12-04 19:15:18
I have a topology (see below) that reads off a very large topic (over a billion messages per day). The memory usage of this Kafka Streams app is pretty high, and I was looking for some suggestions on how I might reduce the footprint of the state stores (more details below). Note: I am not trying to scape goat the state stores, I just think there may be a way for me to improve my topology - see below. // stream receives 1 billion+ messages per day stream .flatMap((key, msg) -> rekeyMessages(msg)) .groupBy((key, value) -> key) .reduce(new MyReducer(), MY_REDUCED_STORE) .toStream() .to(OUTPUT

Monitoring number of consumer for the Kafka topic

浪子不回头ぞ 提交于 2019-12-04 16:05:49
We are using Prometheus and Grafana for monitoring our Kafka cluster. In our application, we use Kafka streams and there is a chance that Kafka stream getting stopped due to exception. We are logging the event setUnCaughtExceptionHandler but, we also need some kind of alerting when the stream stops. What we currently have is, jmx_exporter running as a agent and exposes Kafka metrics through an endpoint and prometheus fetches the metrics from the endpoint. We don't see any kind of metrics which gives the count of active consumers per topic. Are we missing something? Any suggestions on how to

How can I get the offset value in KStream

*爱你&永不变心* 提交于 2019-12-04 12:59:28
I'm developing a PoC with Kafka Streams. Now I need to get the offset value in the stream consumer and use it to generate an unique key (topic-offset)->hash for each message. The reason is: the producers are syslog and only few of them have ID's. I cannot generate an UUID in the consumer because in case of reprocess I need to regenerate the same key. My problem is: the org.apache.kafka.streams.processor.ProcessorContext class expose an .offset() method that returns the value, but I'm using KStream instead of the Processor, and I couldn't find a method that returns the same thing. Anybody knows

How to handle error and don't commit when use Kafka Streams DSL

喜你入骨 提交于 2019-12-04 10:38:31
问题 For Kafka Streams, if we use lower-level processor API, we can control to commit or not. So if problems happens in our code, and we don't want to commit this message. In this case, Kafka will redeliver this message multiple times until the problem gets fixed. But how to control whether commit the message when use its higher-level stream DSL API? Resources: http://docs.confluent.io/2.1.0-alpha1/streams/developer-guide.html 回答1: Your statement is not completely true. You cannot "control to

Why don't I see any output from the Kafka Streams reduce method?

纵饮孤独 提交于 2019-12-04 08:42:16
Given the following code: KStream<String, Custom> stream = builder.stream(Serdes.String(), customSerde, "test_in"); stream .groupByKey(Serdes.String(), customSerde) .reduce(new CustomReducer(), "reduction_state") .print(Serdes.String(), customSerde); I have a println statement inside the apply method of the Reducer, which successfully prints out when I expect the reduction to take place. However, the final print statement shown above displays nothing. likewise if I use a to method rather than print , I see no messages in the destination topic. What do I need after the reduce statement to see

Kafka Streams - updating aggregations on KTable

故事扮演 提交于 2019-12-04 08:10:20
I have a KTable with data that looks like this (key => value), where keys are customer IDs, and values are small JSON objects containing some customer data: 1 => { "name" : "John", "age_group": "25-30"} 2 => { "name" : "Alice", "age_group": "18-24"} 3 => { "name" : "Susie", "age_group": "18-24" } 4 => { "name" : "Jerry", "age_group": "18-24" } I'd like to do some aggregations on this KTable , and basically keep a count of the number of records for each age_group . The desired KTable data would look like this: "18-24" => 3 "25-30" => 1 Lets say Alice , who is in the 18-24 group above, has a

Aggregration and state store retention in kafka streams

天大地大妈咪最大 提交于 2019-12-04 07:18:14
I have a use case like the following. For each incoming event, I want to look at a certain field to see if it's status changed from A to B and if so, send that to an output topic. The flow is like this: An event with key "xyz" comes in with status A, and some time later another event comes in with key "xyz" with status B. I have this code using the high level DSL. final KStream<String, DomainEvent> inputStream.... final KStream<String, DomainEvent> outputStream = inputStream .map((k, v) -> new KeyValue<>(v.getId(), v)) .groupByKey(Serialized.with(Serdes.String(), jsonSerde)) .aggregate

How to evaluate consuming time in kafka stream application

痞子三分冷 提交于 2019-12-04 06:05:22
问题 I have 1.0.0 kafka stream application with two classes as below 'class FilterByPolicyStreamsApp' and 'class FilterByPolicyTransformerSupplier'. In my application, I read the events, perform some conditional checks and forward to same kafka in another topic. I able to get the producing time with 'eventsForwardTimeInMs' variable in FilterByPolicyTransformerSupplier class. But I unable to get the consuming time (with and without (de)serialization). How will I get this time? Please help me.

How does Kafka Streams work with Partitions that contain incomplete Data?

与世无争的帅哥 提交于 2019-12-04 05:58:38
Kafka Streams engine maps a partition to exactly one worker (i.e. Java App), so that all messages in that partition are processed by that worker. I have the following scenario, and am trying to understand if it is still feasible for it to work. I have a Topic A (with 3 partitions). The messages sent to it are partitioned randomly by Kafka (i.e. there is no key). The message I send to it has a schema like below {carModel: "Honda", color: "Red", timeStampEpoch: 14334343342} Since I have 3 partitions, and the messages are partitioned randomly across them, cars of the same model could be written