apache-kafka-streams

Kafka Streams API: KStream to KTable

陌路散爱 提交于 2019-12-02 21:57:59
I have a Kafka topic where I send location events (key=user_id, value=user_location). I am able to read and process it as a KStream : KStreamBuilder builder = new KStreamBuilder(); KStream<String, Location> locations = builder .stream("location_topic") .map((k, v) -> { // some processing here, omitted form clarity Location location = new Location(lat, lon); return new KeyValue<>(k, location); }); That works well, but I'd like to have a KTable with the last known position of each user. How could I do it? I am able to do it writing to and reading from an intermediate topic: // write to

KStream send record to multiple streams (not Branch)

匆匆过客 提交于 2019-12-02 13:52:08
问题 Is there a way to make branch-like operation but to place record in each output stream which predicate evaluates to true? Brach puts record to first match (documentation: A record is placed to one and only one output stream on the first match). 回答1: You can "broadcast" and filter each stream individually: KStream stream = ... stream1 = stream.filter(...); stream2 = stream.filter(...); // and so on... If you use stream variable multiple times, all records are broadcasted to all downstream

Kafka Streams - How to better control partitioning of internally created state store topic?

好久不见. 提交于 2019-12-02 09:49:34
State stores in Kafka Streams are created internally. State stores are partitioned by key, but do not allow to provide partitioning other than by key (to my knowledge). QUESTIONS How to control the number of partitions of a state-store internally created topic ? How does the state store topic infer the number of partitions and the partitioning to use by default, and how to override? How to work it around if you want to partition your state-store by something other than the key of your incoming key-value record and have co-partitioning? In this case, I'd like to partition by something more

What should be the replication factor of changelog/repartition topics

孤者浪人 提交于 2019-12-02 09:08:18
问题 I know it is possible to configure replication factor these internal topics for the kafka streams, our application uses for normal application topics with replication factor 3 but until now I didn't configured the replication factor for the changelog/repartition topics while my assumption was if one broker dies (or for some reason leader changes) kafka stream application will automatically rebalance to new leader. Now I am not so sure about a running Kafka Stream application can rebalance to

Kafka streams exactly once delivery

☆樱花仙子☆ 提交于 2019-12-02 08:16:52
My goal is to consume from topic A, do some processing and produce to topic B, as a single atomic action. To achieve this I see two options: Use a spring-kafka @Kafkalistener and a KafkaTemplate as described here . Use Streams eos (exactly-once) functionality. I have successfully verified option #1. By successfully, I mean that if my processing fails (IllegalArgumentException is thrown) the consumed message from topic A keeps being consumed by the KafkaListener. This is what I expect, as the offset is not committed and DefaultAfterRollbackProcessor is used. I am expecting to see the same

Embedded Kafka: KTable+KTable leftJoin produces duplicate records

五迷三道 提交于 2019-12-02 08:16:10
问题 I come seeking knowledge of the arcane. First, I have two pairs of topics, with one topic in each pair feeding into the other topic. Two KTables are being formed by the latter topics, which are used in a KTable+KTable leftJoin. Problem is, the leftJoin producing THREE records when I produce a single record to either KTable. I would expect two records in the form (A-null, A-B) but instead I get (A-null, A-B, A-null). I have confirmed that the KTables are receiving a single record each. I have

How to process a KStream in a batch of max size or fallback to a time window?

旧巷老猫 提交于 2019-12-02 05:46:55
I would like to create a Kafka stream-based application that processes a topic and takes messages in batches of size X (i.e. 50) but if the stream has low flow, to give me whatever the stream has within Y seconds (i.e. 5). So, instead of processing messages one by one, I process a List[Record] where the size of the list is 50 (or maybe less). This is to make some I/O bound processing more efficient. I know that this can be implemented with the classic Kafka API but was looking for a stream-based implementation that can also handle offset committing natively, taking errors/failures into account

Is Kafka Stream StateStore global over all instances or just local?

南笙酒味 提交于 2019-12-01 16:14:06
In Kafka Stream WordCount example, it uses StateStore to store word counts. If there are multiple instances in the same consumer group, the StateStore is global to the group, or just local to an consumer instance? Thnaks Matthias J. Sax This depends on your view on a state store. In Kafka Streams a state is shared and thus each instance holds part of the overall application state. For example, using DSL stateful operator use a local RocksDB instance to hold their shard of the state. Thus, with this regard the state is local. On the other hand, all changes to the state are written into a Kafka

Is Kafka Stream StateStore global over all instances or just local?

爷,独闯天下 提交于 2019-12-01 15:59:33
问题 In Kafka Stream WordCount example, it uses StateStore to store word counts. If there are multiple instances in the same consumer group, the StateStore is global to the group, or just local to an consumer instance? Thnaks 回答1: This depends on your view on a state store. In Kafka Streams a state is shared and thus each instance holds part of the overall application state. For example, using DSL stateful operator use a local RocksDB instance to hold their shard of the state. Thus, with this

Kafka Streams 2.1.1 class cast while flushing timed aggregation to store

与世无争的帅哥 提交于 2019-12-01 12:56:02
I'm trying to use kafka streams to perform a windowed aggregation and emit the result only after a certain session window is closed. To achieve this I'm using the suppress function. The problem is that I don't find a way to make this simple test work because when it tries to persist the state I get a class cast exception because it tries to cast Windowed to String. I have tried to provide to the aggregate function a Materialized<Windowed<String>,Long,StateStore<>> but it doesn't type check because it expects the first type to be simply string. What am I missing here? kafka version 2.1.1