apache-flink | 易学教程

Real-Time streaming prediction in Flink using scala

阅读更多关于 Real-Time streaming prediction in Flink using scala

问题 Flink version : 1.2.0 Scala version : 2.11.8 I want to use a DataStream to predict using a model in flink using scala. I have a DataStream[String] in flink using scala which contains json formatted data from a kafka source.I want to use this datastream to predict on a Flink-ml model which is already trained. The problem is all the flink-ml examples use DataSet api to predict. I am relatively new to flink and scala so any help in the form of a code solution would be appreciated. Input : {

Flink CsvTableSource Streaming

阅读更多关于 Flink CsvTableSource Streaming

问题 I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types

Apache Flink - Send event if no data was received for x minutes

阅读更多关于 Apache Flink - Send event if no data was received for x minutes

问题 How can I implement an operator with Flink's DataStream API that sends an event when no data was received from a stream for a certain amount of time? 回答1: Such an operator can be implemented using a ProcessFunction . DataStream<Long> input = env.fromElements(1L, 2L, 3L, 4L); input // use keyBy to have keyed state. // NullByteKeySelector will move all data to one task. You can also use other keys .keyBy(new NullByteKeySelector()) // use process function with 60 seconds timeout .process(new

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

阅读更多关于 Apache Flink: What's the difference between side outputs and split() in the DataStream API?

问题 Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they

Apache Flink - custom java options are not recognized inside job

阅读更多关于 Apache Flink - custom java options are not recognized inside job

问题 I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23

Apache Beam Counter/Metrics not available in Flink WebUI

阅读更多关于 Apache Beam Counter/Metrics not available in Flink WebUI

问题 I'm using Flink 1.4.1 and Beam 2.3.0, and would like to know is it possible to have metrics available in Flink WebUI (or anywhere at all), as in Dataflow WebUI ? I've used counter like: import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.Metrics; ... Counter elementsRead = Metrics.counter(getClass(), "elements_read"); ... elementsRead.inc(); but I can't find "elements_read" counts available anywhere (Task Metrics or Accumulators) in Flink WebUI. I thought this will

Differences between working with states and windows(time) in Flink streaming

阅读更多关于 Differences between working with states and windows(time) in Flink streaming

问题 Let's say we want to compute the sum and average of the items, and can either working with states or windows (time). Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java Can I ask what would be the reasons to make

How to filter Apache flink stream on the basis of other?

阅读更多关于 How to filter Apache flink stream on the basis of other?

问题 I have two stream one is of Int and other is of json .In The json Schema there is one key which is some int .So i need to filter the json stream via key comparison with the other integer stream so Is it possible in Flink? 回答1: Yes, you can do this kind of stream processing with Flink. The basic building blocks you need from Flink are connected streams, and stateful functions -- here's an example using a RichCoFlatMap: import org.apache.flink.api.common.state.ValueState; import org.apache

Flink Scala API functions on generic parameters

阅读更多关于 Flink Scala API functions on generic parameters

问题 It's a follow up question on Flink Scala API "not enough arguments". I'd like to be able to pass Flink's DataSet s around and do something with it, but the parameters to the dataset are generic. Here's the problem I have now: import org.apache.flink.api.scala.ExecutionEnvironment import org.apache.flink.api.scala._ import scala.reflect.ClassTag object TestFlink { def main(args: Array[String]) { val env = ExecutionEnvironment.getExecutionEnvironment val text = env.fromElements( "Who's there?",

Flink CEP: Which method to join data streams for different type of events?

阅读更多关于 Flink CEP: Which method to join data streams for different type of events?

Suppose that I have 2 different types of data streams, one providing weather data and the other providing vehicle data, and I would like to use Flink to do complex event processing on the data. Which method in Flink 1.3.x is the correct method to use? I saw different methods like Union, Connect, Window Join. Basically I just want to try a simple CEP like this: IF weather is wet AND vehicle speed > 60 WITHIN the last 10 seconds THEN raise alert Thanks! In my opinion, there are two ways how you can solve this problem: Use a common parent type for different types of events and connect two streams