apache-kafka-streams

Periodic NPE In Kafka Streams Processor Context

会有一股神秘感。 提交于 2021-02-19 04:07:35
问题 Using kafka-streams 0.10.0.0, I am periodically seeing a null pointer exception in the StreamTask when forwarding a message. It varies between 10% to 50% of the invocations. The NPE occurs in this method: public <K, V> void forward(K key, V value) { ProcessorNode thisNode = currNode; try { for (ProcessorNode childNode : (List<ProcessorNode<K, V>>) thisNode.children()) { currNode = childNode; childNode.process(key, value); } } finally { currNode = thisNode; } } It seems that in some cases, the

After kafka crashed, the offsets are lost

拥有回忆 提交于 2021-02-18 16:56:17
问题 Our kafka system crashed because no disk space was available. The consumers are Spring boot application which are using the Kafka Streams API. Now every consumer application shows the following error: java.io.FileNotFoundException: /tmp/kafka-streams/908a79bc-92e7-4f9c-a63a-5030cf4d3555/streams.device-identification-parser/0_48/.checkpoint.tmp (No such file or directory) This exception occurred exactly after the kafka server was restarted. If we restart the application, the service starts at

Kafka Streams: Punctuate vs Process

懵懂的女人 提交于 2021-02-18 06:56:19
问题 In a single task within the stream app, does the following two methods run independently (meaning while the method "process" is handling an incoming message from the upstream source, the method "punctuate" can also run in parallel based on the specified schedule and WALL_CLOCK_TIME as the PunctuationType?) OR do they share same thread so it's either one that runs at a given time, if so would the punctuate method never gets invoked if the process method keeps continuously getting messages from

Kafka: Delete idle consumer group id

天涯浪子 提交于 2021-02-17 04:48:34
问题 In some cases, I use Kafka-stream to model a small in memory (hashmap) projection of a topic. The K,V cache does require some manipulations, so it is not a good case for a GlobalKTable. In such a “caching” scenario, I want all my sibling instances to have the same cache, so I need to bypass the consumer-group mechanism. To enable this, I normally simply start my apps with a randomly generated application Id, so each app will reload the topic each time it restarts. The only caveat to that is

OutOfMemoryError when restart my Kafka Streams appplication

假如想象 提交于 2021-02-11 16:53:30
问题 I have a Kafka Streams app(Kafka Streams 2.1 + Kafka broker 2.0) which does a aggregation based on TimeWindows, and I use the suppress operator to supress the result's output. Everything works well until I restart my app, it will reset the offset of KTABLE-SUPPRESS-STATE-STORE to 0 to restore the suppression state, as expected. But each time I restart it, it will throw an OutOfMemoryError , I thought maybe the heap size is not enough, so I use a larger Xmx/Xms , it works one or two restart,

How to run kafka streams effectively with single app instance and single topic partitions?

◇◆丶佛笑我妖孽 提交于 2021-02-10 20:27:40
问题 Current setup - I am streaming data from 16 single partitioned topics and doing KTable-KTable joins and sending an output with aggregated data from all streams. I am also materializing each KTable to local state-store. Scenarios - When I tried running two app instances, I was expecting it kafka streams to run on single instance but for some reason it ran on other instance too. Looks like it can created stream task on other app instance during kafka streams failure on instance#1 during to some

Why is this KStream/KTable topology propagating records that don't pass the filter?

隐身守侯 提交于 2021-02-09 09:13:07
问题 I have the following topology that: Creates a state store Filters records based on SOME_CONDITION, maps its values to a new entity and finally publishes these records to another topic STATIONS_LOW_CAPACITY_TOPIC However I am seeing this on the STATIONS_LOW_CAPACITY_TOPIC: � null � null � null � {"id":140,"latitude":"40.4592351","longitude":"-3.6915330",...} � {"id":137,"latitude":"40.4591366","longitude":"-3.6894151",...} � null That is to say, it's as if it were also publishing to the

Is consumer offset commited even when failing to post to output topic in Kafka Streams?

你。 提交于 2021-02-08 08:50:34
问题 If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe. An example of the error when trying to post to topic: Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION} In my mind it would

How to display intermediate results in a windowed streaming-etl?

青春壹個敷衍的年華 提交于 2021-02-08 07:42:45
问题 We currently do a real-time aggregation of data in an event-store. The idea is to visualize transaction data for multiple time ranges (monthly, weekly, daily, hourly) and for multiple nominal keys. We regularly have late data, so we need to account for that. Furthermore the requirement is to display "running" results, that is value of the current window even before it is complete. Currently we are using Kafka and Apache Storm (specifically Trident i.e. microbatches) to do this. Our

How to display intermediate results in a windowed streaming-etl?

你离开我真会死。 提交于 2021-02-08 07:42:40
问题 We currently do a real-time aggregation of data in an event-store. The idea is to visualize transaction data for multiple time ranges (monthly, weekly, daily, hourly) and for multiple nominal keys. We regularly have late data, so we need to account for that. Furthermore the requirement is to display "running" results, that is value of the current window even before it is complete. Currently we are using Kafka and Apache Storm (specifically Trident i.e. microbatches) to do this. Our