apache-flink

WindowFunction cannot be applied using WindowStream.apply() function

不打扰是莪最后的温柔 提交于 2019-12-24 00:54:45
问题 I'm relatively new to using Apache Flink and Scala, and am just getting to grips with some of the basic functionality. I've hit a wall trying to implement a custom WindowFunction . The problem is that when I try to implement a custom WindowFunction the IDE gives an error on the ".apply()" function Cannot resolve symbol apply Unspecified value parameters: foldFunction: (NotInferedR, Data.Fingerprint) => NotInferedR, windowFunction: (Tuple, TimeWindow, Iterable[NotInferedR], Collector

Recovering state consistency in Flink when using Kafka as EventStore

夙愿已清 提交于 2019-12-24 00:43:28
问题 The problem I am implementing a microservice as an event-sourcing aggregate which, in turn, is implemented as a Flink FlatMapFunction. In the basic setup, the aggregate reads events and commands from two kafka topics. Then, it writes new events to that first topic and processing results in a third topic. Therefore, Kafka acts as the event store. Hope this drawing helps: RPC Request RPC Result | | ~~~~> Commands-| |---> Results ~~~~~~| |-->Aggregate--| ~> Input evs. -| |---> output evs. ~~~ |

Flink: how does the parallelism set in the Jobmanager UI relate to task slots?

别说谁变了你拦得住时间么 提交于 2019-12-23 20:12:04
问题 Let's say I have 8 task managers with 16 task slots. If I submit a job using the Jobmanager UI and set the parallelism to 8, do I only utilise 8 task slots? What if I have 8 task managers with 8 slots, and submit the same job with a parallelism of 8? Is it exactly the same thing? Or is there a difference in the way the data is processed? Thank you. 回答1: The total number of task slots in a Flink cluster defines the maximum parallelism, but the number of slots used may exceed the actual

How to debug serializable exception in Flink?

拈花ヽ惹草 提交于 2019-12-23 20:06:19
问题 I've encountered several serializable exceptions , and I did some searching on Flink's internet and doc; there are some famous solutions like transient, extends Serializable etc. Each time the origin of exception is very clear, but in my case, i am unable to find where exactly it is not serialized. Q: How should i debug this kind of Exception? A.scala: class executor ( val sink: SinkFunction[List[String]] { def exe(): Unit = { xxx.....addSink(sinks) } } B.scala: class Main extends App { def

Flink: how does the parallelism set in the Jobmanager UI relate to task slots?

女生的网名这么多〃 提交于 2019-12-23 19:50:28
问题 Let's say I have 8 task managers with 16 task slots. If I submit a job using the Jobmanager UI and set the parallelism to 8, do I only utilise 8 task slots? What if I have 8 task managers with 8 slots, and submit the same job with a parallelism of 8? Is it exactly the same thing? Or is there a difference in the way the data is processed? Thank you. 回答1: The total number of task slots in a Flink cluster defines the maximum parallelism, but the number of slots used may exceed the actual

Using Apache Flink for data streaming

杀马特。学长 韩版系。学妹 提交于 2019-12-23 18:24:21
问题 I am working on building an application with below requirements and I am just getting started with flink. Ingest data into Kafka with say 50 partitions (Incoming rate - 100,000 msgs/sec) Read data from Kafka and process each data (Do some computation, compare with old data etc) real time Store the output on Cassandra I was looking for a real time streaming platform and found Flink to be a great fit for both real time and batch. Do you think flink is the best fit for my use case or should I

java.lang.NoSuchMethodException for init method in Scala case class

强颜欢笑 提交于 2019-12-23 16:47:11
问题 I am writing an Apache Flink streaming application that deserializes data (Avro format) read off a Kafka bus (more details on here). The data is being deserialized into a Scala case class. I am getting an exception when i run the program and it received the first message from Kafka Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.myorg.quickstart.DeviceData.<init>() at org.apache.flink.runtime

Apache Flink: Skewed data distribution on KeyedStream

微笑、不失礼 提交于 2019-12-23 16:28:11
问题 I have this Java code in Flink: env.setParallelism(6); //Read from Kafka topic with 12 partitions DataStream<String> line = env.addSource(myConsumer); //Filter half of the records DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.filter(new FilterOdd()); DataStream<Tuple3<String, String, Integer>> line_Num_Odd_2 = line_Num_Odd.map(new OddAdder()); //Filter the other half DataStream<Tuple2<String, Integer>> line_Num_Even = line_Num.filter(new FilterEven()); DataStream<Tuple3<String,

Flink webui when running from IDE

自闭症网瘾萝莉.ら 提交于 2019-12-23 07:47:45
问题 I am trying to see my job in the web ui. I use createLocalEnvironmentWithWebUI, code is running well in IDE, but impossible to see my job in http://localhost:8081/#/overview val conf: Configuration = new Configuration() import org.apache.flink.configuration.ConfigConstants conf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true) val env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val rides = env.addSource

How to deserialize Avro messages from Kafka in Flink (Scala)?

强颜欢笑 提交于 2019-12-23 04:54:28
问题 I'm reading messages from Kafka into Flink Shell (Scala), as follows : scala> val stream = senv.addSource(new FlinkKafkaConsumer011[String]("topic", new SimpleStringSchema(), properties)).print() warning: there was one deprecation warning; re-run with -deprecation for details stream: org.apache.flink.streaming.api.datastream.DataStreamSink[String] = org.apache.flink.streaming.api.datastream.DataStreamSink@71de1091 Here, I'm using the SimpleStringSchema() as the deserializer, but actually the