apache-flink

Cannot see message while sinking kafka stream and cannot see print message in flink 1.2

南笙酒味 提交于 2019-12-11 17:45:56
问题 My goal is to use kafka to read in a string in json format, do a filter to the string and then sink the message out (still in json string format). For testing purpose, my input string message looks like: {"a":1,"b":2} And my code of implementation is: def main(args: Array[String]): Unit = { // parse input arguments val params = ParameterTool.fromArgs(args) if (params.getNumberOfParameters < 4) { println("Missing parameters!\n" + "Usage: Kafka --input-topic <topic> --output-topic <topic> " + "

Why not on-time when I consumed kafka message using flink streaming sql group by TUMBLE(rowtime)?

这一生的挚爱 提交于 2019-12-11 17:22:25
问题 When I produce 20 messages, only consume 13 messages, the rest 7 not consumed real-time and timely. When some time later, I produce another 20 messages, the rest 7 messages of last time just been consumed. Complete code in location: https://github.com/shaozhipeng/flink-quickstart/blob/master/src/main/java/me/icocoro/quickstart/streaming/sql/KafkaStreamToJDBCTable.java Update different AssignerWithPeriodicWatermarks was not effective. private static final String LOCAL_KAFKA_BROKER = "localhost

Convert Apache Flink Datastream to a Datastream that makes tumbling windows of 2 events and sum on a value

删除回忆录丶 提交于 2019-12-11 16:59:42
问题 I have a Flink Table with the following columns: final String[] hNames = {"mID", "dateTime", "mValue", "unixDateTime", "mType"}; I want to create a DataStream in Apache Flink that makes tumbling windows of a length of 2 each and calculates the average mValue for that window. Below I've used the SUM function since it seems there isnt a AVG function. These windows must be grouped on the mID (is a Integer) or dateTime column. I key the windows by the column mType , since these represent a

Streaming Data Processing and nano second time resolution

﹥>﹥吖頭↗ 提交于 2019-12-11 16:58:17
问题 I'm just starting into the topic of real-time stream data processing frameworks, and I have a question to which I as of yet could not find any conclusive answer: Do the usual suspects (Apache's Spark, Kafka, Storm, Flink etc.) support processing data with an event time resolution of nanoseconds (or even picoseconds)? Most people and documentation talk about a millisecond or microsecond resolution, but I was unable to find a definite answer if more resolution would be possible or a problem.

Apache Flink add new stream dynamically

安稳与你 提交于 2019-12-11 16:56:37
问题 Is it possible in Apache Flink, to add a new datastream dynamically during runtime without restarting the Job? As far as I understood, a usual Flink program looks like this: val env = StreamExecutionEnvironment.getExecutionEnvironment() val text = env.socketTextStream(hostname, port, "\n") val windowCounts = text.map... env.execute("Socket Window WordCount") In my case it is possible, that e.g. a new device is started and therefore another stream must be processed. But how to add this new

How to implement FlinkKafkaProducer serializer for Kafka 2.2

亡梦爱人 提交于 2019-12-11 16:31:32
问题 I've been working on updating a Flink processor (Flink version 1.9) that reads from Kafka and then writes to Kafka. We have written this processor to run towards a Kafka 0.10.2 cluster and now we have deployed a new Kafka cluster running version 2.2. Therefore I set out to update the processor to use the latest FlinkKafkaConsumer and FlinkKafkaProducer (as suggested by the Flink docs). However I've run into some problems with the Kafka producer. I'm unable to get it to Serialize data using

Apache Flink - Event time windows

廉价感情. 提交于 2019-12-11 16:17:46
问题 I want to create keyed windows in Apache flink such that the windows for each key gets executed n minutes after arrival of first event for the key. Is it possible to be done using Event time characteristics ( as processing time depends on system clock and it is uncertain when will the first event arrives ). If it is possible please explain the assignment of Event time and watermark also to the events and also explain how to call the process window function after n minutes. Below is a part of

Authenticate with ECE ElasticSearch Sink from Apache Fink (Scala code)

浪尽此生 提交于 2019-12-11 15:32:16
问题 Compiler error when using example provided in Flink documentation. The Flink documentation provides sample Scala code to set the REST client factory parameters when talking to Elasticsearch, https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html. When trying out this code i get a compiler error in IntelliJ which says "Cannot resolve symbol restClientBuilder". I found the following SO which is EXACTLY my problem except that it is in Java and i am doing this in

flink Operator State is thread safe?

流过昼夜 提交于 2019-12-11 15:31:56
问题 With Operator State (or non-keyed state), each operator state is bound to one parallel operator instance The above quote is from the official Flink website. Each parallel operator instance may have thread pool. When these threads access the Operator State (as described above, each parallel operator instance can have one operator state), would it encounter a thread-safe problem? Should I use Operator State with terminology like below in Java? synchronize(stateInstance){ //update state } 回答1: I

NFS (Netapp server)-> Flink ->s3

假如想象 提交于 2019-12-11 15:03:14
问题 I am new to flink (java)and trying to move xml files on a netapp file server mounted as a file path onto server that flink is installed. How to do batch or stream processing in real time to fetch files coming to the folder and sink it with s3. I couldn't find any examples in flink-starter to read files from local file system, is flink atleast a right choice to this use case? If so where can I find resources to listen to folder and manage checkpoints/ save points? 回答1: If your goal is simply