apache-kafka-streams | 易学教程

Kafka Streams - Low-Level Processor API - RocksDB TimeToLive(TTL)

阅读更多关于 Kafka Streams - Low-Level Processor API - RocksDB TimeToLive(TTL)

问题 I'm kind of experimenting with the low level processor API. I'm doing data aggregation on incoming records using the processor API and writing the aggregated records to RocksDB. However, I want to retain the records added in the rocksdb to be active only for 24hr period. After 24hr period the record should be deleted. This can be done by changing the ttl settings. However, there is not much documentation where I can get some help on this. how do I change the ttl value? What java api should I

using kafka-streams to conditionally sort a json input stream

阅读更多关于 using kafka-streams to conditionally sort a json input stream

问题 I am new to developing kafka-streams applications. My stream processor is meant to sort json messages based on a value of a user key in the input json message. Message 1: {"UserID": "1", "Score":"123", "meta":"qwert"} Message 2: {"UserID": "5", "Score":"780", "meta":"mnbvs"} Message 3: {"UserID": "2", "Score":"0", "meta":"fghjk"} I have read here Dynamically connecting a Kafka input stream to multiple output streams that there is no dynamic solution. In my use-case I know the user keys and

Why does co-partitioning of two Kstreams in kafka require same number of partitions for both the streams?

阅读更多关于 Why does co-partitioning of two Kstreams in kafka require same number of partitions for both the streams?

问题 I wanted to know why does co-partitioning of two Kstreams in kafka require same number of partitions for both the streams as is given in the documentation in below URL: enter link description here 回答1: As the name "co-partition" indicates, you want to put data from different topic but same key to the same Kafka Streams application instance. If you don't have the same number of partitions, it's not possible to get this behavior. Assume you have topic A with 2 partitions and topic B with 3

Issue with ArrayList Serde in Kafka Streams API

阅读更多关于 Issue with ArrayList Serde in Kafka Streams API

问题 Based on my previous question, I am still trying to figure out what's the issue with my code. I've got a most basic topic possible: keys and values are a type of Long and this is my producer code: public class DemoProducer { public static void main(String... args) { Producer<Long, Long> producer = new KafkaProducer<>(createProperties()); LongStream.range(1, 100) .forEach( i -> LongStream.range(100, 115) .forEach(j -> producer.send(new ProducerRecord<>("test", i, j)))); producer.close(); }

Newly build KTable returns nothing

阅读更多关于 Newly build KTable returns nothing

问题 I am trying to use KTable to consume events from Kafka topic. But, it returns nothing. When I use KStream, it returns and prints objects. This is really strange. Producer and Consumer can be found here //Not working KTable<String, Customer> customerKTable = streamsBuilder.table("customer", Consumed.with(Serdes.String(), customerSerde),Materialized.<String, Customer, KeyValueStore<Bytes, byte[]>>as(customerStateStore.name())); customerKTable.foreach(((key, value) -> System.out.println(

How to unit test a kafka stream application that uses session window

阅读更多关于 How to unit test a kafka stream application that uses session window

问题 I am working with Kafka Stream 2.1 I am trying to write some test for a stream application that aggregates some events by their key (i.e by a correlation ID) using a session window with an inactivity gap of 300ms. Here is the aggregation implementation represented by a method : private static final int INACTIVITY_GAP = 300; public KStream<String, AggregatedCustomObject> aggregate(KStream<String, CustomObject> source) { return source // group by key (i.e by correlation ID) .groupByKey(Grouped

Kafka Streams with lookup data on HDFS

阅读更多关于 Kafka Streams with lookup data on HDFS

问题 I'm writing an application with Kafka Streams (v0.10.0.1) and would like to enrich the records I'm processing with lookup data. This data (timestamped file) is written into a HDFS directory on daily basis (or 2-3 times a day). How can I load this in the Kafka Streams application and join to the actual KStream ? What would be the best practice to reread the data from HDFS when a new file arrives there? Or would it be better switching to Kafka Connect and write the RDBMS table content to a

Kafka Streams: Proper way to exit on error

阅读更多关于 Kafka Streams: Proper way to exit on error

问题 I've been successful in getting a streams app to consume, transform and produce data, but I've noticed that periodically, the streams processor will transition to a state of ERROR and the process will sit there without exiting. Showing me logs like: All stream threads have died. The instance will be in error state and should be closed. Is there a way to tell the Streams app to exit once its reached the ERROR state? Maybe a monitor thread of sorts? I see references in the comments of the Kafka

state store may have migrated to another instance

阅读更多关于 state store may have migrated to another instance

when i try to access state stere from stream, am getting below error, The state store, count-store, may have migrated to another instance when i tried to access ReadOnlyKeyValueStore from store, getting erorr message as migrated to other server. but am having only one broker is up and running /** * */ package com.ms.kafka.com.ms.stream; import java.util.Properties; import java.util.stream.Stream; import org.apache.kafka.common.serialization.Serdes; import org.apache.kafka.streams.KafkaStreams; import org.apache.kafka.streams.StreamsBuilder; import org.apache.kafka.streams.StreamsConfig; import

Can I run Kafka Streams Application on the same machine as of Kafka Broker?

阅读更多关于 Can I run Kafka Streams Application on the same machine as of Kafka Broker?

I have a Kafka Streams Application which takes data from few topics and joins the data and puts it in another topic. Kafka Configuration: 5 kafka brokers Kafka Topics - 15 partitions and 3 replication factor. Note: I am running Kafka Streams Applications on the same machines where my Kafka Brokers are running. Few millions of records are consumed/produced every hour. Whenever I take any kafka broker down, it goes into rebalancing and it takes approx. 30 minutes or sometimes even more for rebalancing and many times it kills many of the Kafka Streams processes. It is technically possible to run