apache-flink | 易学教程

Flink Custom Trigger giving Unexpected Output

阅读更多关于 Flink Custom Trigger giving Unexpected Output

问题 I want to create a Trigger which gets fired in 20 seconds for the first time and in every five seconds after that. I have used GlobalWindows and a custom Trigger val windowedStream = valueStream .keyBy(0) .window(GlobalWindows.create()) .trigger(TradeTrigger.of()) Here is the code in TradeTrigger : @PublicEvolving public class TradeTrigger<W extends Window> extends Trigger<Object, W> { private static final long serialVersionUID = 1L; static boolean flag=false; static long ctime = System

StreamingFileSink not ingesting data to s3

阅读更多关于 StreamingFileSink not ingesting data to s3

问题 I have created simple ingestion service that picks onpremise files and ingest to s3 using StreamingFileSink. https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html I have set up everything as per the documentation but it is not working. I tested with the sink location to another local on prem path and files are getting there (but hidden as .part files) Does this mean part files are also send to s3 but not visible ? ... final StreamExecutionEnvironment env =

Should the entire cluster be restarted if a single Task Manager crashes?

阅读更多关于 Should the entire cluster be restarted if a single Task Manager crashes?

问题 We're running a standalone Flink cluster with 2 Job Managers and 3 Task Managers. Whenever a TM crashes, we simply restart that particular TM and proceed with the processing. But reading the comments on this question makes it look like we need to restart all the 5 nodes that form a cluster to deal with the failure of a single TM. Am I reading this right? What would be the consequences if we restart just the crashed TM and let the healthy ones run as is? 回答1: Sorry if I my answer elsewhere was

Apache flink - Mini cluster - Windowing operator execution problem

阅读更多关于 Apache flink - Mini cluster - Windowing operator execution problem

问题 This turned out to be the problem for below question Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed - Mini Cluster problem So I wanted to ask by giving specific detail. Adding a very simple windowing operator to job causes below error in MINI CLUSTER ENVIRONMENT: Caused by: java.lang.RuntimeException: segment has been freed at org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:123) at org.apache

How to utilize Flink's TestHarness class?

阅读更多关于 How to utilize Flink's TestHarness class?

问题 I need to test a CoFlatMapFunction that shares state. Through my reading I have come to conclusion I should use the TestHarness class per: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#testing-checkpointing-and-state-handling Since it is not apart of the public api, I cannot figure out how to import it without copy and pasting the class itself. I thought it maybe in flink-test-utils-junit, but it was not as well. 回答1: You'll need to add these 4 dependencies to

How to generate dynamic path in dataset during the output method

阅读更多关于 How to generate dynamic path in dataset during the output method

问题 Is there a way to create a dynamic DataSink output path in Flink? DataSet has data type as Tuple2<String, String> When we tried using stream I had a way to generate dynamic bath using custom Bucketer like below @Override public Path getBucketPath(Clock clock, Path basePath, Tuple2<String, String> element) { return new Path(basePath + "/schema=" + element.f0.toLowerCase().trim() + "/"); } I would like to know is there a similar way to handle in DataSet for generating the custom path. 回答1: I

Flink: Write tuples with CSV header into file

阅读更多关于 Flink: Write tuples with CSV header into file

问题 I did some data processing using Flink (1.7.1 with Hadoop). At the end I'd like to write the dataset consisting of 2-tuples into a file. Currently, I am doing it like this: <Tuple2<Integer, Point>> pointsClustered = points.getClusteredPoints(...); pointsClustered.writeAsCsv(params.get("output"), "\n", ","); However, I would like to have the CSV headers written into the first line. The Flink's Javadoc API doesn't state any options for this. Furthermore, I couldn't find any solution googling

Starting Batch process from a stream job

阅读更多关于 Starting Batch process from a stream job

问题 Hi I have a maven project for Flink stream processing. Based the message I get from the stream I start a batch process but currently I am getting an error. I am pretty new to this flink world and please let me know if you have any idea. Here is the code I am using to start a standalone cluster. final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment ( ); KafkaConsumerService kafkaConsumerService= new KafkaConsumerService(); FlinkKafkaConsumer010<String>

Trouble with deserializing Avro data in Scala

阅读更多关于 Trouble with deserializing Avro data in Scala

问题 I am building an Apache Flink application in Scala which reads streaming data from a Kafka bus and then performs summarizing operations on it. The data from Kafka is in Avro format and needs a special Deserialization class. I found this scala class AvroDeserializationScehema (http://codegists.com/snippet/scala/avrodeserializationschemascala_saveveltri_scala): package org.myorg.quickstart import org.apache.avro.io.BinaryDecoder import org.apache.avro.io.DatumReader import org.apache.avro.io

Event time window on kafka source streaming

阅读更多关于 Event time window on kafka source streaming

问题 There is a topic in Kafka server. In the program, we read this topic as a stream and assign event timestamp. Then do window operation on this stream. But the program doesn't work. After debug, it seems that processWatermark method of WindowOperator is not executed. Here is my code. DataStream<Tuple2<String, Long>> advertisement = env .addSource(new FlinkKafkaConsumer082<String>("advertisement", new SimpleStringSchema(), properties)) .map(new MapFunction<String, Tuple2<String, Long>>() {