apache-kafka-streams | 易学教程

How does Kafka Streams work with Partitions that contain incomplete Data?

阅读更多关于 How does Kafka Streams work with Partitions that contain incomplete Data?

问题 Kafka Streams engine maps a partition to exactly one worker (i.e. Java App), so that all messages in that partition are processed by that worker. I have the following scenario, and am trying to understand if it is still feasible for it to work. I have a Topic A (with 3 partitions). The messages sent to it are partitioned randomly by Kafka (i.e. there is no key). The message I send to it has a schema like below {carModel: "Honda", color: "Red", timeStampEpoch: 14334343342} Since I have 3

Consume latest value from a topic for each keys

阅读更多关于 Consume latest value from a topic for each keys

I have a Kafka producer which is producing messages at high rate (message key is let us say a username and value is his current score in a game). The Kafka consumer is relatively slow in processing the consumed messages. Here my requirement is to show most up-to-date score and avoid showing stale data, with the tradeoff that some scores may never be shown. Essentially for each of the username, I may have hundreds of messages in the same partition, but I always want to read the latest one. A crude solution which has been implemented was like this: The producer sends just a key as each message

How to create a state store with HashMap as value in Kafka streams?

阅读更多关于 How to create a state store with HashMap as value in Kafka streams?

I need to create a state store with String key HashMap as value. I tried the below two methods. // First method StateStoreSupplier avgStoreNew = Stores.create("AvgsNew") .withKeys(Serdes.String()) .withValues(HashMap.class) .persistent() .build(); // Second method HashMap<String ,Double> h = new HashMap<String ,Double>(); StateStoreSupplier avgStore1 = Stores.create("Avgs") .withKeys(Serdes.String()) .withValues(Serdes.serdeFrom(h.getClass())) .persistent() .build(); The code compiles fine without any error, but I get a runtime error io.confluent.examples.streams.WordCountProcessorAPIException

Kafka Streams with lookup data on HDFS

阅读更多关于 Kafka Streams with lookup data on HDFS

I'm writing an application with Kafka Streams (v0.10.0.1) and would like to enrich the records I'm processing with lookup data. This data (timestamped file) is written into a HDFS directory on daily basis (or 2-3 times a day). How can I load this in the Kafka Streams application and join to the actual KStream ? What would be the best practice to reread the data from HDFS when a new file arrives there? Or would it be better switching to Kafka Connect and write the RDBMS table content to a Kafka topic which can be consumed by all the Kafka Streams application instances? Update : As suggested

How to unit test a kafka stream application that uses session window

阅读更多关于 How to unit test a kafka stream application that uses session window

I am working with Kafka Stream 2.1 I am trying to write some test for a stream application that aggregates some events by their key (i.e by a correlation ID) using a session window with an inactivity gap of 300ms. Here is the aggregation implementation represented by a method : private static final int INACTIVITY_GAP = 300; public KStream<String, AggregatedCustomObject> aggregate(KStream<String, CustomObject> source) { return source // group by key (i.e by correlation ID) .groupByKey(Grouped.with(Serdes.String(), new CustomSerde())) // Define a session window with an inactivity gap of 300 ms

Kafka Streams: Proper way to exit on error

阅读更多关于 Kafka Streams: Proper way to exit on error

I've been successful in getting a streams app to consume, transform and produce data, but I've noticed that periodically, the streams processor will transition to a state of ERROR and the process will sit there without exiting. Showing me logs like: All stream threads have died. The instance will be in error state and should be closed. Is there a way to tell the Streams app to exit once its reached the ERROR state? Maybe a monitor thread of sorts? I see references in the comments of the Kafka Streams code to the user needing to close the application once its reached this state, however, I

Understanding transaction in Processor implementation in Kafka Streams

阅读更多关于 Understanding transaction in Processor implementation in Kafka Streams

问题 While using Processor API of Kafka Streams, I use something like this: context.forward(key,value) context.commit() Actually, what I'm doing here is sending forward a state from state store to sink every minute (using context.schedule() in init() method). What I don't understand here is: [Key,Value] pair I'm sending forward and then doing commit() is taken from state store . It is aggregated according to my specific logic from many not sequential input [key,value] pairs. Each such output [key

Kafka Streams: use the same `application.id` to consume from multiple topics

阅读更多关于 Kafka Streams: use the same `application.id` to consume from multiple topics

I have an application that needs to listen to multiple different topics; each topic has separate logic for how the messages are handled. I had thought to use the same kafka properties for each KafkaStreams instance, but I get an error like the one below. Error java.lang.IllegalArgumentException: Assigned partition my-topic-1 for non-subscribed topic regex pattern; subscription pattern is my-other-topic Code (kotlin) class KafkaSetup() { companion object { private val LOG = LoggerFactory.getLogger(this::class.java) } fun getProperties(): Properties { val properties = Properties() properties.put

Streaming messages to multiple topics

阅读更多关于 Streaming messages to multiple topics

问题 I have a single master topic and multiple predicates each of which has an output topic associated with it. I want to send each record to ALL topics that whose predicate resolves to true. I am using Luwak to test which predicates a record satisfies (to use this library you evaluate a document with a list of predicates and it tells you which ones matched - i.e. I only call it once to get the list of satisfied predicates). I am trying to use Kafka Streams for this but there doesn't seem to be

Kafka - problems with TimestampExtractor

阅读更多关于 Kafka - problems with TimestampExtractor

问题 I use org.apache.kafka:kafka-streams:0.10.0.1 I'm attempting to work with a time series based stream that doesn't seem to be triggering a KStream.Process() to trigger ("punctuate"). (see here for reference) In a KafkaStreams config I'm passing in this param (among others): config.put( StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, EventTimeExtractor.class.getName()); Here, EventTimeExtractor is a custom timestamp extractor (that implements org.apache.kafka.streams.processor