apache-flink | 易学教程

Elasticsearch 5 connector in Apache Flink 1.3

阅读更多关于 Elasticsearch 5 connector in Apache Flink 1.3

By reading the documentation I understood that with Apache Flink 1.3 I should be able to use Elasticsearch 5.x. However, in my pom.xml : <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch5_2.10</artifactId> <version>1.3.0</version> </dependency> I got this : Dependency "org.apache.flink:flink-connector-elasticsearch5_2.10:1.3.0" not found Any idea why this dependency is unfound ? This was a bug in the 1.3.0 release and is being fixed for 1.3.1 (which is due very soon). See the mailing list for more details. Use the following in the pom.xml <dependency>

flink kafka consumer groupId not working

阅读更多关于 flink kafka consumer groupId not working

问题 I am using kafka with flink. In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it. According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like: If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total). But the actual result is that, each program would receive 2 pieces of the

TaskManager was lost/killed

阅读更多关于 TaskManager was lost/killed

When I am trying to run the flink job in standalone cluster I get this error: java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='2961948b9ac490c11c6e41b0ec197e9f'} @ localhost (dataPort=55795) at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167) at org.apache.flink.runtime

How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

阅读更多关于 How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

I have one questions about running Flink streaming job in IDE or as fat jar without deploying it to Flink server. The problem is I cannot run it in IDE when I have more than 1 taskslot in my job. public class StreamingJob { public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties kafkaProperties = new Properties(); kafkaProperties.setProperty("bootstrap.servers", "localhost:9092"); kafkaProperties.setProperty("group.id", "test"); env

Flink CsvTableSource Streaming

阅读更多关于 Flink CsvTableSource Streaming

I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types.STRING()) .field("designation", Types.STRING()) .field("age", Types.INT()) .field("location", Types

How to configure Flink cluster for logging via web ui?

阅读更多关于 How to configure Flink cluster for logging via web ui?

问题 I have a Flink cluster set up and I'd like to be able to view the logs and stdout for the JobManager and TaskManagers. When I go to the web ui, I see the following error messages on the respective tabs: JobManager: Logs (log file unavailable) Stdout (stdout file unavailable) TaskManager Logs Fetching TaskManager log failed. Stdout Fetching TaskManager log failed. I can see that there are some config parameters that could be set, notably taskmanager.log.path , job manager.web.log.path and env

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

阅读更多关于 Apache Flink: What's the difference between side outputs and split() in the DataStream API?

Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html ) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they cost the same? When and how we should select one of them? The split operator is part of the DataStream

How to handle errors in custom MapFunction correctly?

阅读更多关于 How to handle errors in custom MapFunction correctly?

I have implemented MapFunction for my Apache Flink flow. It is parsing incoming elements and convert them to other format but sometimes error can appear (i.e. incoming data is not valid). I see two possible ways how to handle it: Ignore invalid elements but seems like I can't ignore errors because for any incoming element I must provide outgoing element. Split incoming elements to valid and invalid but seems like I should use other function for this. So, I have two questions: How to handle errors correctly in my MapFunction ? How to implement such transformation functions correctly? You could

Flink BucketingSink with Custom AvroParquetWriter create empty file

阅读更多关于 Flink BucketingSink with Custom AvroParquetWriter create empty file

问题 I have created a writer for BucketingSink. The sink and writer works without error but when it comes to the writer writing avro genericrecord to parquet, the file was created from in-progress, pending to complete. But the files are empty with 0 bytes. Can anyone tell me what is wrong with the code ? I have tried placing the initialization of AvroParquetWriter at the open() method, but result still the same. When debugging the code, I confirm that writer.write(element) does executed and

Apache Flink - custom java options are not recognized inside job

阅读更多关于 Apache Flink - custom java options are not recognized inside job

I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -XX:MaxPermSize=256m 2017-02-20 12:19:23,536