apache-kafka-streams

How does Kafka Streams work with Partitions that contain incomplete Data?

无人久伴 提交于 2019-12-06 01:04:18
问题 Kafka Streams engine maps a partition to exactly one worker (i.e. Java App), so that all messages in that partition are processed by that worker. I have the following scenario, and am trying to understand if it is still feasible for it to work. I have a Topic A (with 3 partitions). The messages sent to it are partitioned randomly by Kafka (i.e. there is no key). The message I send to it has a schema like below {carModel: "Honda", color: "Red", timeStampEpoch: 14334343342} Since I have 3

Consume latest value from a topic for each keys

前提是你 提交于 2019-12-05 14:11:36
I have a Kafka producer which is producing messages at high rate (message key is let us say a username and value is his current score in a game). The Kafka consumer is relatively slow in processing the consumed messages. Here my requirement is to show most up-to-date score and avoid showing stale data, with the tradeoff that some scores may never be shown. Essentially for each of the username, I may have hundreds of messages in the same partition, but I always want to read the latest one. A crude solution which has been implemented was like this: The producer sends just a key as each message

How to create a state store with HashMap as value in Kafka streams?

依然范特西╮ 提交于 2019-12-05 09:31:33
I need to create a state store with String key HashMap as value. I tried the below two methods. // First method StateStoreSupplier avgStoreNew = Stores.create("AvgsNew") .withKeys(Serdes.String()) .withValues(HashMap.class) .persistent() .build(); // Second method HashMap<String ,Double> h = new HashMap<String ,Double>(); StateStoreSupplier avgStore1 = Stores.create("Avgs") .withKeys(Serdes.String()) .withValues(Serdes.serdeFrom(h.getClass())) .persistent() .build(); The code compiles fine without any error, but I get a runtime error io.confluent.examples.streams.WordCountProcessorAPIException

Kafka Streams with lookup data on HDFS

╄→гoц情女王★ 提交于 2019-12-05 08:21:39
I'm writing an application with Kafka Streams (v0.10.0.1) and would like to enrich the records I'm processing with lookup data. This data (timestamped file) is written into a HDFS directory on daily basis (or 2-3 times a day). How can I load this in the Kafka Streams application and join to the actual KStream ? What would be the best practice to reread the data from HDFS when a new file arrives there? Or would it be better switching to Kafka Connect and write the RDBMS table content to a Kafka topic which can be consumed by all the Kafka Streams application instances? Update : As suggested

How to unit test a kafka stream application that uses session window

喜你入骨 提交于 2019-12-05 07:56:16
I am working with Kafka Stream 2.1 I am trying to write some test for a stream application that aggregates some events by their key (i.e by a correlation ID) using a session window with an inactivity gap of 300ms. Here is the aggregation implementation represented by a method : private static final int INACTIVITY_GAP = 300; public KStream<String, AggregatedCustomObject> aggregate(KStream<String, CustomObject> source) { return source // group by key (i.e by correlation ID) .groupByKey(Grouped.with(Serdes.String(), new CustomSerde())) // Define a session window with an inactivity gap of 300 ms

Kafka Streams: Proper way to exit on error

眉间皱痕 提交于 2019-12-05 07:36:15
I've been successful in getting a streams app to consume, transform and produce data, but I've noticed that periodically, the streams processor will transition to a state of ERROR and the process will sit there without exiting. Showing me logs like: All stream threads have died. The instance will be in error state and should be closed. Is there a way to tell the Streams app to exit once its reached the ERROR state? Maybe a monitor thread of sorts? I see references in the comments of the Kafka Streams code to the user needing to close the application once its reached this state, however, I

Understanding transaction in Processor implementation in Kafka Streams

半世苍凉 提交于 2019-12-05 07:28:03
问题 While using Processor API of Kafka Streams, I use something like this: context.forward(key,value) context.commit() Actually, what I'm doing here is sending forward a state from state store to sink every minute (using context.schedule() in init() method). What I don't understand here is: [Key,Value] pair I'm sending forward and then doing commit() is taken from state store . It is aggregated according to my specific logic from many not sequential input [key,value] pairs. Each such output [key

Kafka Streams: use the same `application.id` to consume from multiple topics

只谈情不闲聊 提交于 2019-12-05 03:25:30
I have an application that needs to listen to multiple different topics; each topic has separate logic for how the messages are handled. I had thought to use the same kafka properties for each KafkaStreams instance, but I get an error like the one below. Error java.lang.IllegalArgumentException: Assigned partition my-topic-1 for non-subscribed topic regex pattern; subscription pattern is my-other-topic Code (kotlin) class KafkaSetup() { companion object { private val LOG = LoggerFactory.getLogger(this::class.java) } fun getProperties(): Properties { val properties = Properties() properties.put

Streaming messages to multiple topics

淺唱寂寞╮ 提交于 2019-12-04 22:03:00
问题 I have a single master topic and multiple predicates each of which has an output topic associated with it. I want to send each record to ALL topics that whose predicate resolves to true. I am using Luwak to test which predicates a record satisfies (to use this library you evaluate a document with a list of predicates and it tells you which ones matched - i.e. I only call it once to get the list of satisfied predicates). I am trying to use Kafka Streams for this but there doesn't seem to be

Kafka - problems with TimestampExtractor

南楼画角 提交于 2019-12-04 21:53:21
问题 I use org.apache.kafka:kafka-streams:0.10.0.1 I'm attempting to work with a time series based stream that doesn't seem to be triggering a KStream.Process() to trigger ("punctuate"). (see here for reference) In a KafkaStreams config I'm passing in this param (among others): config.put( StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, EventTimeExtractor.class.getName()); Here, EventTimeExtractor is a custom timestamp extractor (that implements org.apache.kafka.streams.processor