apache-kafka-streams | 易学教程

How to Handle Different Timezone in Kafka Streams?

阅读更多关于 How to Handle Different Timezone in Kafka Streams?

问题 So I was evaluating the Kafka Streams and what it can do to see if it can fit my use case as I needed to do the aggregation of sensor's data for each 15min, Hourly, Daily and found it useful due to its Windowing feature. As I can create windows by applying windowedBy() on KGroupedStream but the problem is that windows are created in UTC and i want my data to be grouped by its originating timezone not by UTC Timezone as it hampers the aggregation so can any one help me on this. 回答1: You can

UnsatisfiedLinkError: /tmp/snappy-1.1.4-libsnappyjava.so Error loading shared library ld-linux-x86-64.so.2: No such file or directory

阅读更多关于 UnsatisfiedLinkError: /tmp/snappy-1.1.4-libsnappyjava.so Error loading shared library ld-linux-x86-64.so.2: No such file or directory

问题 I am trying to run a Kafka Streams application in kubernetes. When I launch the pod I get the following exception: Exception in thread "streams-pipe-e19c2d9a-d403-4944-8d26-0ef27ed5c057-StreamThread-1" java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.4-5cec5405-2ce7-4046-a8bd-922ce96534a0-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /tmp/snappy-1.1.4-5cec5405-2ce7-4046-a8bd-922ce96534a0-libsnappyjava.so) at java.lang.ClassLoader

How to connect to multiple clusters in a single Kafka Streams application?

阅读更多关于 How to connect to multiple clusters in a single Kafka Streams application?

In the Kafka Streams Developer Guide it says: Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value. Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input streams and writing output streams. Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster? Could I create multiple KafkaStreams instances with different properties that connect to different clusters? It means that a single application can only

Dynamically connecting a Kafka input stream to multiple output streams

阅读更多关于 Dynamically connecting a Kafka input stream to multiple output streams

Is there functionality built into Kafka Streams that allows for dynamically connecting a single input stream into multiple output streams? KStream.branch allows branching based on true/false predicates, but this isn't quite what I want. I'd like each incoming log to determine the topic it will be streamed to at runtime, e.g., a log {"date": "2017-01-01"} will be streamed to the topic topic-2017-01-01 and a log {"date": "2017-01-02"} will be streamed to the topic topic-2017-01-02 . I could call forEach on the stream, then write to a Kafka producer, but that doesn't seem very elegant. Is there a

Kafka Streaming Concurrency?

阅读更多关于 Kafka Streaming Concurrency?

问题 I have some basic Kafka Streaming code that reads records from one topic, does some processing, and outputs records to another topic. How does Kafka streaming handle concurrency? Is everything run in a single thread? I don't see this mentioned in the documentation. If it's single threaded, I would like options for multi-threaded processing to handle high volumes of data. If it's multi-threaded, I need to understand how this works and how to handle resources, like SQL database connections

Handling bad messages using Kafka's Streams API

阅读更多关于 Handling bad messages using Kafka's Streams API

I have a basic stream processing flow which looks like master topic -> my processing in a mapper/filter -> output topics and I am wondering about the best way to handle "bad messages". This could potentially be things like messages that I can't deserialize properly, or perhaps the processing/filtering logic fails in some unexpected way (I have no external dependencies so there should be no transient errors of that sort). I was considering wrapping all my processing/filtering code in a try catch and if an exception was raised then routing to an "error topic". Then I can study the message and

How to filter keys and value with a Processor using Kafka Stream DSL

阅读更多关于 How to filter keys and value with a Processor using Kafka Stream DSL

问题 I have a Processor that interact with a StateStore to filter and do complex logic on the messages. In the process(key,value) method I use context.forward(key,value) to send the keys and values that I need. For debugging purposes I also print those. I have a KStream mergedStream that results from a join of two other streams. I want to apply the processor to the records of that stream. I achieve this with : mergedStream.process(myprocessor,"stateStoreName") When I start this program, I can see

Test Kafka Streams topology

阅读更多关于 Test Kafka Streams topology

问题 I'm searching a way to test a Kafka Streams application. So that I can define the input events and the test suite shows me the output. Is this possible without a real Kafka setup? 回答1: Update Kafka 1.1.0 (released 23-Mar-2018): KIP-247 added official test utils. Per the Upgrade Guide: There is a new artifact kafka-streams-test-utils providing a TopologyTestDriver , ConsumerRecordFactory , and OutputVerifier class. You can include the new artifact as a regular dependency to your unit tests and

External system queries during Kafka Stream processing

阅读更多关于 External system queries during Kafka Stream processing

问题 I'm trying to design a streaming architecture for streaming analytics. Requirements: RT and NRT streaming data input Stream processors implementing some financial analysis RT and NRT analysis output stream Reference data requests during stream processing I'm exploring Kafka and Kafka Streams for stream processing and RT/NRT realtime messaging. My question is: I need to perform some query to external systems (info providers, MongoDB etc etc) during stream pocessing. These queries could be both

Print Kafka Stream Input out to console?

阅读更多关于 Print Kafka Stream Input out to console?

问题 I've been looking through a lot of the Kafka documentation for a java application that I am working on. I've tried getting into the lambda syntax introduced in Java 8, but I am a little sketchy on that ground and don't feel too confident that it should be what I use as of yet. I've a Kafka/Zookeeper Service running without any troubles, and what I want to do is write a small example program that based on the input will write it out, but not do a wordcount as there are so many examples of