apache-flink

Elasticsearch 5 connector in Apache Flink 1.3

浪子不回头ぞ 提交于 2019-12-06 03:23:54
By reading the documentation I understood that with Apache Flink 1.3 I should be able to use Elasticsearch 5.x. However, in my pom.xml : <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch5_2.10</artifactId> <version>1.3.0</version> </dependency> I got this : Dependency "org.apache.flink:flink-connector-elasticsearch5_2.10:1.3.0" not found Any idea why this dependency is unfound ? This was a bug in the 1.3.0 release and is being fixed for 1.3.1 (which is due very soon). See the mailing list for more details. Use the following in the pom.xml <dependency>

flink kafka consumer groupId not working

最后都变了- 提交于 2019-12-06 03:10:43
问题 I am using kafka with flink. In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it. According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like: If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total). But the actual result is that, each program would receive 2 pieces of the

TaskManager was lost/killed

回眸只為那壹抹淺笑 提交于 2019-12-05 19:08:26
When I am trying to run the flink job in standalone cluster I get this error: java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='2961948b9ac490c11c6e41b0ec197e9f'} @ localhost (dataPort=55795) at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167) at org.apache.flink.runtime

How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

感情迁移 提交于 2019-12-05 16:17:43
I have one questions about running Flink streaming job in IDE or as fat jar without deploying it to Flink server. The problem is I cannot run it in IDE when I have more than 1 taskslot in my job. public class StreamingJob { public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties kafkaProperties = new Properties(); kafkaProperties.setProperty("bootstrap.servers", "localhost:9092"); kafkaProperties.setProperty("group.id", "test"); env

Flink CsvTableSource Streaming

孤人 提交于 2019-12-05 13:35:45
I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types.STRING()) .field("designation", Types.STRING()) .field("age", Types.INT()) .field("location", Types

How to configure Flink cluster for logging via web ui?

你说的曾经没有我的故事 提交于 2019-12-05 06:37:48
问题 I have a Flink cluster set up and I'd like to be able to view the logs and stdout for the JobManager and TaskManagers. When I go to the web ui, I see the following error messages on the respective tabs: JobManager: Logs (log file unavailable) Stdout (stdout file unavailable) TaskManager Logs Fetching TaskManager log failed. Stdout Fetching TaskManager log failed. I can see that there are some config parameters that could be set, notably taskmanager.log.path , job manager.web.log.path and env

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

ε祈祈猫儿з 提交于 2019-12-05 05:33:39
Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html ) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they cost the same? When and how we should select one of them? The split operator is part of the DataStream

How to handle errors in custom MapFunction correctly?

筅森魡賤 提交于 2019-12-05 04:01:34
I have implemented MapFunction for my Apache Flink flow. It is parsing incoming elements and convert them to other format but sometimes error can appear (i.e. incoming data is not valid). I see two possible ways how to handle it: Ignore invalid elements but seems like I can't ignore errors because for any incoming element I must provide outgoing element. Split incoming elements to valid and invalid but seems like I should use other function for this. So, I have two questions: How to handle errors correctly in my MapFunction ? How to implement such transformation functions correctly? You could

Flink BucketingSink with Custom AvroParquetWriter create empty file

我们两清 提交于 2019-12-05 03:36:41
问题 I have created a writer for BucketingSink. The sink and writer works without error but when it comes to the writer writing avro genericrecord to parquet, the file was created from in-progress, pending to complete. But the files are empty with 0 bytes. Can anyone tell me what is wrong with the code ? I have tried placing the initialization of AvroParquetWriter at the open() method, but result still the same. When debugging the code, I confirm that writer.write(element) does executed and

Apache Flink - custom java options are not recognized inside job

我是研究僧i 提交于 2019-12-05 03:09:47
I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -XX:MaxPermSize=256m 2017-02-20 12:19:23,536