apache-flink | 易学教程

WindowFunction cannot be applied using WindowStream.apply() function

阅读更多关于 WindowFunction cannot be applied using WindowStream.apply() function

问题 I'm relatively new to using Apache Flink and Scala, and am just getting to grips with some of the basic functionality. I've hit a wall trying to implement a custom WindowFunction . The problem is that when I try to implement a custom WindowFunction the IDE gives an error on the ".apply()" function Cannot resolve symbol apply Unspecified value parameters: foldFunction: (NotInferedR, Data.Fingerprint) => NotInferedR, windowFunction: (Tuple, TimeWindow, Iterable[NotInferedR], Collector

Recovering state consistency in Flink when using Kafka as EventStore

阅读更多关于 Recovering state consistency in Flink when using Kafka as EventStore

问题 The problem I am implementing a microservice as an event-sourcing aggregate which, in turn, is implemented as a Flink FlatMapFunction. In the basic setup, the aggregate reads events and commands from two kafka topics. Then, it writes new events to that first topic and processing results in a third topic. Therefore, Kafka acts as the event store. Hope this drawing helps: RPC Request RPC Result | | ~~~~> Commands-| |---> Results ~~~~~~| |-->Aggregate--| ~> Input evs. -| |---> output evs. ~~~ |

Flink: how does the parallelism set in the Jobmanager UI relate to task slots?

阅读更多关于 Flink: how does the parallelism set in the Jobmanager UI relate to task slots?

问题 Let's say I have 8 task managers with 16 task slots. If I submit a job using the Jobmanager UI and set the parallelism to 8, do I only utilise 8 task slots? What if I have 8 task managers with 8 slots, and submit the same job with a parallelism of 8? Is it exactly the same thing? Or is there a difference in the way the data is processed? Thank you. 回答1: The total number of task slots in a Flink cluster defines the maximum parallelism, but the number of slots used may exceed the actual

How to debug serializable exception in Flink?

阅读更多关于 How to debug serializable exception in Flink?

问题 I've encountered several serializable exceptions , and I did some searching on Flink's internet and doc; there are some famous solutions like transient, extends Serializable etc. Each time the origin of exception is very clear, but in my case, i am unable to find where exactly it is not serialized. Q: How should i debug this kind of Exception? A.scala: class executor ( val sink: SinkFunction[List[String]] { def exe(): Unit = { xxx.....addSink(sinks) } } B.scala: class Main extends App { def

Flink: how does the parallelism set in the Jobmanager UI relate to task slots?

阅读更多关于 Flink: how does the parallelism set in the Jobmanager UI relate to task slots?

Using Apache Flink for data streaming

阅读更多关于 Using Apache Flink for data streaming

问题 I am working on building an application with below requirements and I am just getting started with flink. Ingest data into Kafka with say 50 partitions (Incoming rate - 100,000 msgs/sec) Read data from Kafka and process each data (Do some computation, compare with old data etc) real time Store the output on Cassandra I was looking for a real time streaming platform and found Flink to be a great fit for both real time and batch. Do you think flink is the best fit for my use case or should I

java.lang.NoSuchMethodException for init method in Scala case class

阅读更多关于 java.lang.NoSuchMethodException for init method in Scala case class

问题 I am writing an Apache Flink streaming application that deserializes data (Avro format) read off a Kafka bus (more details on here). The data is being deserialized into a Scala case class. I am getting an exception when i run the program and it received the first message from Kafka Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.myorg.quickstart.DeviceData.<init>() at org.apache.flink.runtime

Apache Flink: Skewed data distribution on KeyedStream

阅读更多关于 Apache Flink: Skewed data distribution on KeyedStream

问题 I have this Java code in Flink: env.setParallelism(6); //Read from Kafka topic with 12 partitions DataStream<String> line = env.addSource(myConsumer); //Filter half of the records DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.filter(new FilterOdd()); DataStream<Tuple3<String, String, Integer>> line_Num_Odd_2 = line_Num_Odd.map(new OddAdder()); //Filter the other half DataStream<Tuple2<String, Integer>> line_Num_Even = line_Num.filter(new FilterEven()); DataStream<Tuple3<String,

Flink webui when running from IDE

阅读更多关于 Flink webui when running from IDE

问题 I am trying to see my job in the web ui. I use createLocalEnvironmentWithWebUI, code is running well in IDE, but impossible to see my job in http://localhost:8081/#/overview val conf: Configuration = new Configuration() import org.apache.flink.configuration.ConfigConstants conf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true) val env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val rides = env.addSource

How to deserialize Avro messages from Kafka in Flink (Scala)?

阅读更多关于 How to deserialize Avro messages from Kafka in Flink (Scala)?

问题 I'm reading messages from Kafka into Flink Shell (Scala), as follows : scala> val stream = senv.addSource(new FlinkKafkaConsumer011[String]("topic", new SimpleStringSchema(), properties)).print() warning: there was one deprecation warning; re-run with -deprecation for details stream: org.apache.flink.streaming.api.datastream.DataStreamSink[String] = org.apache.flink.streaming.api.datastream.DataStreamSink@71de1091 Here, I'm using the SimpleStringSchema() as the deserializer, but actually the