apache-flink

Real-Time streaming prediction in Flink using scala

允我心安 提交于 2019-12-07 09:22:41
问题 Flink version : 1.2.0 Scala version : 2.11.8 I want to use a DataStream to predict using a model in flink using scala. I have a DataStream[String] in flink using scala which contains json formatted data from a kafka source.I want to use this datastream to predict on a Flink-ml model which is already trained. The problem is all the flink-ml examples use DataSet api to predict. I am relatively new to flink and scala so any help in the form of a code solution would be appreciated. Input : {

Flink CsvTableSource Streaming

允我心安 提交于 2019-12-07 09:21:18
问题 I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types

Apache Flink - Send event if no data was received for x minutes

流过昼夜 提交于 2019-12-07 08:15:34
问题 How can I implement an operator with Flink's DataStream API that sends an event when no data was received from a stream for a certain amount of time? 回答1: Such an operator can be implemented using a ProcessFunction . DataStream<Long> input = env.fromElements(1L, 2L, 3L, 4L); input // use keyBy to have keyed state. // NullByteKeySelector will move all data to one task. You can also use other keys .keyBy(new NullByteKeySelector()) // use process function with 60 seconds timeout .process(new

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

时间秒杀一切 提交于 2019-12-07 00:08:14
问题 Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they

Apache Flink - custom java options are not recognized inside job

可紊 提交于 2019-12-06 22:22:36
问题 I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23

Apache Beam Counter/Metrics not available in Flink WebUI

时光怂恿深爱的人放手 提交于 2019-12-06 20:29:32
问题 I'm using Flink 1.4.1 and Beam 2.3.0, and would like to know is it possible to have metrics available in Flink WebUI (or anywhere at all), as in Dataflow WebUI ? I've used counter like: import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.Metrics; ... Counter elementsRead = Metrics.counter(getClass(), "elements_read"); ... elementsRead.inc(); but I can't find "elements_read" counts available anywhere (Task Metrics or Accumulators) in Flink WebUI. I thought this will

Differences between working with states and windows(time) in Flink streaming

为君一笑 提交于 2019-12-06 16:09:04
问题 Let's say we want to compute the sum and average of the items, and can either working with states or windows (time). Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java Can I ask what would be the reasons to make

How to filter Apache flink stream on the basis of other?

天涯浪子 提交于 2019-12-06 12:00:24
问题 I have two stream one is of Int and other is of json .In The json Schema there is one key which is some int .So i need to filter the json stream via key comparison with the other integer stream so Is it possible in Flink? 回答1: Yes, you can do this kind of stream processing with Flink. The basic building blocks you need from Flink are connected streams, and stateful functions -- here's an example using a RichCoFlatMap: import org.apache.flink.api.common.state.ValueState; import org.apache

Flink Scala API functions on generic parameters

爷,独闯天下 提交于 2019-12-06 11:04:29
问题 It's a follow up question on Flink Scala API "not enough arguments". I'd like to be able to pass Flink's DataSet s around and do something with it, but the parameters to the dataset are generic. Here's the problem I have now: import org.apache.flink.api.scala.ExecutionEnvironment import org.apache.flink.api.scala._ import scala.reflect.ClassTag object TestFlink { def main(args: Array[String]) { val env = ExecutionEnvironment.getExecutionEnvironment val text = env.fromElements( "Who's there?",

Flink CEP: Which method to join data streams for different type of events?

廉价感情. 提交于 2019-12-06 10:45:54
Suppose that I have 2 different types of data streams, one providing weather data and the other providing vehicle data, and I would like to use Flink to do complex event processing on the data. Which method in Flink 1.3.x is the correct method to use? I saw different methods like Union, Connect, Window Join. Basically I just want to try a simple CEP like this: IF weather is wet AND vehicle speed > 60 WITHIN the last 10 seconds THEN raise alert Thanks! In my opinion, there are two ways how you can solve this problem: Use a common parent type for different types of events and connect two streams