apache-flink

Apache Beam Counter/Metrics not available in Flink WebUI

心已入冬 提交于 2019-12-05 01:29:08
I'm using Flink 1.4.1 and Beam 2.3.0, and would like to know is it possible to have metrics available in Flink WebUI (or anywhere at all), as in Dataflow WebUI ? I've used counter like: import org.apache.beam.sdk.metrics.Counter; import org.apache.beam.sdk.metrics.Metrics; ... Counter elementsRead = Metrics.counter(getClass(), "elements_read"); ... elementsRead.inc(); but I can't find "elements_read" counts available anywhere (Task Metrics or Accumulators) in Flink WebUI. I thought this will be straightforward after BEAM-773 . Once you have selected a job in your dashboard, you will see the

Differences between working with states and windows(time) in Flink streaming

这一生的挚爱 提交于 2019-12-04 23:06:19
Let's say we want to compute the sum and average of the items, and can either working with states or windows (time). Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java Can I ask what would be the reasons to make decision? Can I infer that if the data arrives very irregularly (50% comes in the defined window length and

Apache Flink streaming in cluster does not split jobs with workers

喜欢而已 提交于 2019-12-04 21:43:23
问题 My objective is to setup a high throughput cluster using Kafka as source & Flink as the stream processing engine. Here's what I have done. I have setup a 2-node cluster the following configuration on the master and the slave. Master flink-conf.yaml jobmanager.rpc.address: <MASTER_IP_ADDR> #localhost jobmanager.rpc.port: 6123 jobmanager.heap.mb: 256 taskmanager.heap.mb: 512 taskmanager.numberOfTaskSlots: 50 parallelism.default: 100 Slave flink-conf.yaml jobmanager.rpc.address: <MASTER_IP_ADDR>

Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)

空扰寡人 提交于 2019-12-04 20:30:35
I am using Apache Flink v1.6.0 and I am trying to write to Elasticsearch v6.4.0, which is hosted in Elastic Cloud . I am having issue when authenticating to the Elastic Cloud cluster. I have been able to get Flink to write to a local Elasticsearch v6.4.0 node, which does not have encryption using the following code: /* Elasticsearch Configuration */ List<HttpHost> httpHosts = new ArrayList<>(); httpHosts.add(new HttpHost("127.0.0.1", 9200, "http")); // use a ElasticsearchSink.Builder to create an ElasticsearchSink ElasticsearchSink.Builder<ObjectNode> esSinkBuilder = new ElasticsearchSink

Flink and Dynamic templates recognition

。_饼干妹妹 提交于 2019-12-04 19:07:15
We plan to use Flink CEP for processing a big amount of events according to some dynamic templates. The system must recognize chains of events (sometimes complicated chains with conditions and grouping). The templates will be created by user. In other words we have to create complicated templates without touching the code. Is it possible to use Apache Flink for solving this problem? Does Filnk support dynamic-templates? At the moment Flink's CEP library does not support this kind of dynamic rule adaption. However, there is no fundamental reason which makes it impossible to implement. In fact,

What does “streaming” mean in Apache Spark and Apache Flink?

醉酒当歌 提交于 2019-12-04 19:03:57
问题 As I went to Apache Spark Streaming Website, I saw a sentence: Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. And in Apache Flink Website, there is a sentence: Apache Flink is an open source platform for scalable batch and stream data processing. What means streaming application and batch data processing , stream data processing ? Can you give some concrete examples? Are they designed for sensor data? 回答1: Streaming data analysis (in contrast to "batch"

Global sorting in Apache Flink

懵懂的女人 提交于 2019-12-04 18:50:46
问题 sortPartition method of a dataset sorts the dataset locally based on some specified fields. How can I get my large Dataset sorted globally in an efficient way in Flink? 回答1: This is currently not easily possible because Flink does not provide a built-in range partitioning strategy, yet. A work-around is to implement a custom Partitioner : DataSet<Tuple2<Long, Long>> data = ... data .partitionCustom(new Partitioner<Long>() { int partition(Long key, int numPartitions) { // your implementation }

How to filter Apache flink stream on the basis of other?

大憨熊 提交于 2019-12-04 16:56:20
I have two stream one is of Int and other is of json .In The json Schema there is one key which is some int .So i need to filter the json stream via key comparison with the other integer stream so Is it possible in Flink? Yes, you can do this kind of stream processing with Flink. The basic building blocks you need from Flink are connected streams, and stateful functions -- here's an example using a RichCoFlatMap: import org.apache.flink.api.common.state.ValueState; import org.apache.flink.api.common.state.ValueStateDescriptor; import org.apache.flink.api.common.typeinfo.TypeHint; import org

Flink Scala API functions on generic parameters

放肆的年华 提交于 2019-12-04 16:35:51
It's a follow up question on Flink Scala API "not enough arguments" . I'd like to be able to pass Flink's DataSet s around and do something with it, but the parameters to the dataset are generic. Here's the problem I have now: import org.apache.flink.api.scala.ExecutionEnvironment import org.apache.flink.api.scala._ import scala.reflect.ClassTag object TestFlink { def main(args: Array[String]) { val env = ExecutionEnvironment.getExecutionEnvironment val text = env.fromElements( "Who's there?", "I think I hear them. Stand, ho! Who's there?") val split = text.flatMap { _.toLowerCase.split("\\W+"

Keep keyed state across multiple transformations

冷暖自知 提交于 2019-12-04 15:08:51
I have a stream that I want to partition using a certain key, and then run through several transformations, each using a state. When I call keyBy() I get a KeyedStream and the next transformation can access a partitioned state correctly, but another transformation chained after that gets an exception when trying to access a partitioned state. The exception is: State key serializer has not been configured in the config. This operation cannot use partitioned state It seems that the key information is only passed to the first transformation and not further down the chain. The code I try to run is