apache-flink

Flink Custom Trigger giving Unexpected Output

白昼怎懂夜的黑 提交于 2019-12-12 02:45:26
问题 I want to create a Trigger which gets fired in 20 seconds for the first time and in every five seconds after that. I have used GlobalWindows and a custom Trigger val windowedStream = valueStream .keyBy(0) .window(GlobalWindows.create()) .trigger(TradeTrigger.of()) Here is the code in TradeTrigger : @PublicEvolving public class TradeTrigger<W extends Window> extends Trigger<Object, W> { private static final long serialVersionUID = 1L; static boolean flag=false; static long ctime = System

StreamingFileSink not ingesting data to s3

↘锁芯ラ 提交于 2019-12-12 01:28:47
问题 I have created simple ingestion service that picks onpremise files and ingest to s3 using StreamingFileSink. https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html I have set up everything as per the documentation but it is not working. I tested with the sink location to another local on prem path and files are getting there (but hidden as .part files) Does this mean part files are also send to s3 but not visible ? ... final StreamExecutionEnvironment env =

Should the entire cluster be restarted if a single Task Manager crashes?

帅比萌擦擦* 提交于 2019-12-12 01:27:15
问题 We're running a standalone Flink cluster with 2 Job Managers and 3 Task Managers. Whenever a TM crashes, we simply restart that particular TM and proceed with the processing. But reading the comments on this question makes it look like we need to restart all the 5 nodes that form a cluster to deal with the failure of a single TM. Am I reading this right? What would be the consequences if we restart just the crashed TM and let the healthy ones run as is? 回答1: Sorry if I my answer elsewhere was

Apache flink - Mini cluster - Windowing operator execution problem

倖福魔咒の 提交于 2019-12-12 01:25:28
问题 This turned out to be the problem for below question Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed - Mini Cluster problem So I wanted to ask by giving specific detail. Adding a very simple windowing operator to job causes below error in MINI CLUSTER ENVIRONMENT: Caused by: java.lang.RuntimeException: segment has been freed at org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:123) at org.apache

How to utilize Flink's TestHarness class?

眉间皱痕 提交于 2019-12-11 18:49:07
问题 I need to test a CoFlatMapFunction that shares state. Through my reading I have come to conclusion I should use the TestHarness class per: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#testing-checkpointing-and-state-handling Since it is not apart of the public api, I cannot figure out how to import it without copy and pasting the class itself. I thought it maybe in flink-test-utils-junit, but it was not as well. 回答1: You'll need to add these 4 dependencies to

How to generate dynamic path in dataset during the output method

邮差的信 提交于 2019-12-11 18:44:00
问题 Is there a way to create a dynamic DataSink output path in Flink? DataSet has data type as Tuple2<String, String> When we tried using stream I had a way to generate dynamic bath using custom Bucketer like below @Override public Path getBucketPath(Clock clock, Path basePath, Tuple2<String, String> element) { return new Path(basePath + "/schema=" + element.f0.toLowerCase().trim() + "/"); } I would like to know is there a similar way to handle in DataSet for generating the custom path. 回答1: I

Flink: Write tuples with CSV header into file

一笑奈何 提交于 2019-12-11 18:38:53
问题 I did some data processing using Flink (1.7.1 with Hadoop). At the end I'd like to write the dataset consisting of 2-tuples into a file. Currently, I am doing it like this: <Tuple2<Integer, Point>> pointsClustered = points.getClusteredPoints(...); pointsClustered.writeAsCsv(params.get("output"), "\n", ","); However, I would like to have the CSV headers written into the first line. The Flink's Javadoc API doesn't state any options for this. Furthermore, I couldn't find any solution googling

Starting Batch process from a stream job

北慕城南 提交于 2019-12-11 18:17:38
问题 Hi I have a maven project for Flink stream processing. Based the message I get from the stream I start a batch process but currently I am getting an error. I am pretty new to this flink world and please let me know if you have any idea. Here is the code I am using to start a standalone cluster. final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment ( ); KafkaConsumerService kafkaConsumerService= new KafkaConsumerService(); FlinkKafkaConsumer010<String>

Trouble with deserializing Avro data in Scala

若如初见. 提交于 2019-12-11 18:06:16
问题 I am building an Apache Flink application in Scala which reads streaming data from a Kafka bus and then performs summarizing operations on it. The data from Kafka is in Avro format and needs a special Deserialization class. I found this scala class AvroDeserializationScehema (http://codegists.com/snippet/scala/avrodeserializationschemascala_saveveltri_scala): package org.myorg.quickstart import org.apache.avro.io.BinaryDecoder import org.apache.avro.io.DatumReader import org.apache.avro.io

Event time window on kafka source streaming

徘徊边缘 提交于 2019-12-11 17:58:15
问题 There is a topic in Kafka server. In the program, we read this topic as a stream and assign event timestamp. Then do window operation on this stream. But the program doesn't work. After debug, it seems that processWatermark method of WindowOperator is not executed. Here is my code. DataStream<Tuple2<String, Long>> advertisement = env .addSource(new FlinkKafkaConsumer082<String>("advertisement", new SimpleStringSchema(), properties)) .map(new MapFunction<String, Tuple2<String, Long>>() {