apache-kafka-streams

How to display intermediate results in a windowed streaming-etl?

二次信任 提交于 2021-02-08 07:42:02
问题 We currently do a real-time aggregation of data in an event-store. The idea is to visualize transaction data for multiple time ranges (monthly, weekly, daily, hourly) and for multiple nominal keys. We regularly have late data, so we need to account for that. Furthermore the requirement is to display "running" results, that is value of the current window even before it is complete. Currently we are using Kafka and Apache Storm (specifically Trident i.e. microbatches) to do this. Our

Collection of lambda functions in Java Streams

泄露秘密 提交于 2021-02-08 05:31:14
问题 I have a stream function KStream<K, V>[] branch(final Predicate<? super K, ? super V>... predicates) . I wanted to create a list of predicates dynamically. Is that possible? KStream<Long, AccountMigrationEvent>[] branches = stream .map((key, event) -> enrich(key, event)) .branch(getStrategies()); [...] private List<org.apache.kafka.streams.kstream.Predicate<Long, AccountMigrationEvent>> getStrategies() { ArrayList<Predicate<Long, AccountMigrationEvent>> predicates = new ArrayList<>(); for

Events that should be emitted by a KTable

拜拜、爱过 提交于 2021-02-08 03:42:07
问题 I am trying to test a topology that, as the last node, has a KTable. My test is using a full-blown Kafka Cluster (through confluent's Docker images) so I am not using the TopologyTestDriver . My topology has input of key-value types String -> Customer and output of String -> CustomerMapped . The serdes, schemas and integration with Schema Registry all work as expected. I am using Scala, Kafka 2.2.0, Confluent Platform 5.2.1 and kafka-streams-scala . My topology, as simplified as possible,

Events that should be emitted by a KTable

一世执手 提交于 2021-02-08 03:41:10
问题 I am trying to test a topology that, as the last node, has a KTable. My test is using a full-blown Kafka Cluster (through confluent's Docker images) so I am not using the TopologyTestDriver . My topology has input of key-value types String -> Customer and output of String -> CustomerMapped . The serdes, schemas and integration with Schema Registry all work as expected. I am using Scala, Kafka 2.2.0, Confluent Platform 5.2.1 and kafka-streams-scala . My topology, as simplified as possible,

Events that should be emitted by a KTable

我的未来我决定 提交于 2021-02-08 03:41:07
问题 I am trying to test a topology that, as the last node, has a KTable. My test is using a full-blown Kafka Cluster (through confluent's Docker images) so I am not using the TopologyTestDriver . My topology has input of key-value types String -> Customer and output of String -> CustomerMapped . The serdes, schemas and integration with Schema Registry all work as expected. I am using Scala, Kafka 2.2.0, Confluent Platform 5.2.1 and kafka-streams-scala . My topology, as simplified as possible,

Kafka Streams: RocksDB TTL

可紊 提交于 2021-02-07 20:46:14
问题 I understand that the default TTL is set to infinity (non-positive). However, if we need to retain data in the store for max of 2 days, can we do the override with the RocksDBConfigSetter interface implementation, that is options.setWalTtlSeconds(172800)? OR would it conflict with the Kafka streams internals? Ref: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config 回答1: This is currently not possible. Kafka Streams disables

Kafka Streams: RocksDB TTL

假装没事ソ 提交于 2021-02-07 20:45:30
问题 I understand that the default TTL is set to infinity (non-positive). However, if we need to retain data in the store for max of 2 days, can we do the override with the RocksDBConfigSetter interface implementation, that is options.setWalTtlSeconds(172800)? OR would it conflict with the Kafka streams internals? Ref: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config 回答1: This is currently not possible. Kafka Streams disables

Retention time in kafka local state store / changelog

大憨熊 提交于 2021-02-07 10:10:33
问题 I'm using Kafka and Kafka Streams as part of Spring Cloud Stream. The data that is flowing in my Kafka Streams app is being aggregated and materialized by certain time windows: Materialized<String, ErrorScore, WindowStore<Bytes, byte[]>> oneHour = Materialized.as("one-hour-store"); oneHour.withLoggingEnabled(topicConfig); events .map(getStringSensorMeasurementKeyValueKeyValueMapper()) .groupByKey() .windowedBy(TimeWindows.of(oneHourStore.getTimeUnit())) .reduce((aggValue, newValue) ->

Kafka Join not firing after re-key

烈酒焚心 提交于 2021-02-07 09:44:03
问题 I'm working on a Kafka streams application written in Kotlin, and I'm seeing some bizarre behavior with a join. At a high level, I'm streaming two topics with different keys. However, I can rekey one of the messages so that they keys line up. After I do this though, the subsequent join is not fired. Below I have supplied the simplified code (with irrelevant portions elided and replaced with comments) val builder = KStreamBuilder() val joinWindow = JoinWindows.of(/* 30 days */).until(/* 365

Kafka Join not firing after re-key

拟墨画扇 提交于 2021-02-07 09:43:33
问题 I'm working on a Kafka streams application written in Kotlin, and I'm seeing some bizarre behavior with a join. At a high level, I'm streaming two topics with different keys. However, I can rekey one of the messages so that they keys line up. After I do this though, the subsequent join is not fired. Below I have supplied the simplified code (with irrelevant portions elided and replaced with comments) val builder = KStreamBuilder() val joinWindow = JoinWindows.of(/* 30 days */).until(/* 365