apache-kafka-streams

Left joining a KStream on another Kstream, but only with “latest” results

耗尽温柔 提交于 2019-12-11 07:23:18
问题 I have a data stream on Kafka that I stream as a Kstream. Next to it I have a meta data stream that I would like to enrich the data stream with. A fairly common scenario present in several examples. What I haven't solved is when the meta data stream contains more than one result for the specified window. What is commonly wanted in this scenario is to join it with the latest, or last, element from the meta data stream. A sales order would for example be materialised once, with the latest

Kafka Streams failed to delete the state directory - DirectoryNotEmptyException

柔情痞子 提交于 2019-12-11 06:04:30
问题 I noticed the exception stream-thread [x-CleanupThread] Failed to delete the state directory with our kafka streams application. Application uses a windowed state store and is defined as: Stores.windowStoreBuilder( Stores.persistentWindowStore( storeName, retentionPeriod, retentionWindowSize, false), Serdes.String(), Serdes.String()).withCachingEnabled(); This is not a test issue using topology driver. This is in actual deployed stream application. Every ten minutes it will try to delete the

Issue with Kafka stream filtering

点点圈 提交于 2019-12-11 05:21:24
问题 I'm trying to run a basic app from the following example: https://github.com/confluentinc/examples/blob/3.3.x/kafka-streams/src/main/scala/io/confluent/examples/streams/MapFunctionScalaExample.scala However I'm getting an exception at this line: // Variant 1: using `mapValues` val uppercasedWithMapValues: KStream[Array[Byte], String] = textLines.mapValues(_.toUpperCase()) Error:(33, 25) missing parameter type for expanded function ((x$1) => x$1.toUpperCase()) textLines.mapValues(_.toUpperCase

How do I transform/fork a Kafka stream and send it over to a specific topic?

痴心易碎 提交于 2019-12-11 04:18:59
问题 I am Trying to transform the string value obtained in my original stream "textlines" into JSONObject Messages using the function "mapValues" into newStream. Then stream whatever I get in newStream onto a topic called "testoutput". But everytime a message actually goes through the transformation block I get a NullPointerException with errors pointing only into kafka stream libraries. Have no idea what is going on :(( P.S. When I fork/create a new kafka stream from the original stream, does the

How to run two or more topologies with the same APPLICATION_ID_CONFIG?

自闭症网瘾萝莉.ら 提交于 2019-12-11 04:08:31
问题 I want to run 2 topologies on same instance. 1 topology involves state store and other involves global store. How do I do this succesfully? I have created 1 topic with 3 partitions and then added a state store in 1 topology and global store in 2nd topology. Topology 1 : public void createTopology() { Topology topology = new Topology(); topology.addSource("source", new KeyDeserializer(), new ValueDeserializer(), "topic1"); topology.addProcessor("processor1", new CustomProcessorSupplier1(),

Kafka stream processor thread safe?

故事扮演 提交于 2019-12-11 03:19:19
问题 I know this question was asked before here: Kafka Streaming Concurrency? But yet this is very strange to me. According to the documentation (or maybe I am missing something) each partition has a task meaning different instance of processors and each task is being execute by different thread. But when I tested it, I saw that different threads can get different instances of processor. Therefore if you want to keep any in memory state (old fashioned way) in your processor you must lock? Example

Can Kafka Streams output topic be on a separate cluster?

北城余情 提交于 2019-12-11 01:30:06
问题 I have a topic where all logs are pushed to centralized topic but I would like to filter out some of those records to a separate topic and cluster if possible. Thanks 回答1: Kafka streams not allow to create stream with source and output topics from different Kafka clusters. So the following code will not work for you streamsBuilder.stream(sourceTopicName).filter(..).to(outputTopicName) in this case it expects that outputTopicName is from the same cluster as topic sourceTopicName. As a

Set timestamp in output with Kafka Streams fails for transformations

為{幸葍}努か 提交于 2019-12-11 00:57:37
问题 Suppose we have a transformer (written in Scala) new Transformer[String, V, (String, V)]() { var context: ProcessorContext = _ override def init(context: ProcessorContext): Unit = { this.context = context } override def transform(key: String, value: V): (String, V) = { val timestamp = toTimestamp(value) context.forward(key, value, To.all().withTimestamp(timestamp)) key -> value } override def close(): Unit = () } where toTimestamp is just a function which returns an a timestamp fetched from

Streaming from particular partition within a topic (Kafka Streams)

徘徊边缘 提交于 2019-12-10 21:54:47
问题 As far as I understand after reading Kafka Streams documentation, it's not possible to use it for streaming data from only one partition from given topic, one always have to read it whole. Is that correct? If so, are there any plans to provide such an option to the API in the future? 回答1: No you can't do that because the internal consumer subscribes to the topic joining a consumer group which is specified through the application-id so the partitions are assigned automatically. Btw why do you

kafka streaming : java.nio.file.DirectoryNotEmptyException

孤者浪人 提交于 2019-12-10 18:45:30
问题 We have issue with deleting the state directory within Kafka streaming application. We are running application on inhouse container platform. Insight into this issue will be much appreciated. The log of the exception: 2018-09-18 09:26:09.112 INFO 1 --- [5-CleanupThread] o.a.k.s.p.internals.StateDirectory : stream-thread [ApplicationName-1ae22d38-32d3-451a-b039-372c79b2e6a5-CleanupThread] Deleting obsolete state directory 2_1 for task 2_1 as 601112ms has elapsed (cleanup delay is 600000ms).