apache-flink

Flink streaming: how to control the execution time

本小妞迷上赌 提交于 2020-01-06 05:55:40
问题 Spark streaming provides API for termination awaitTermination(). Is there any similar API available to gracefully shut down flink streaming after some t seconds? 回答1: Your driver program (i.e. the main method) in Flink doesn't stay running while the streaming job executes. Your program should define a dataflow, call execute , and then terminate. In Spark, the driver program stays running (AFAIK), and awaitTermination relates to that. Note that a Flink streaming dataflow continues to execute

Flink streaming: how to control the execution time

谁说我不能喝 提交于 2020-01-06 05:54:40
问题 Spark streaming provides API for termination awaitTermination(). Is there any similar API available to gracefully shut down flink streaming after some t seconds? 回答1: Your driver program (i.e. the main method) in Flink doesn't stay running while the streaming job executes. Your program should define a dataflow, call execute , and then terminate. In Spark, the driver program stays running (AFAIK), and awaitTermination relates to that. Note that a Flink streaming dataflow continues to execute

Apache Beam with Flink backend throws NoSuchMethodError on calls to protobuf-java library methods

纵饮孤独 提交于 2020-01-06 03:41:07
问题 I'm trying to run a simple pipeline on local cluster using Protocol Buffer to pass data between Beam functions. The com.google.protobuf:protobuf-java is included in FatJar. Everything works fine if I run it through: java -jar target/dataflow-test-1.0-SNAPSHOT.jar \ --runner=org.apache.beam.runners.flink.FlinkRunner \ --input=/tmp/kinglear.txt --output=/tmp/wordcounts.txt But it fails when trying to run on flink cluster: flink run target/dataflow-test-1.0-SNAPSHOT.jar \ --runner=org.apache

Flink 1.7.0 Dashboard not show Task Statistics

两盒软妹~` 提交于 2020-01-05 07:18:09
问题 I use Flink 1.7 dashboard and select a streaming job. This should show me some metrics, but it remains to load. I deployed the same job in a Flink 1.5 cluster, and I can watch the metrics. Flink is running in docker swarm, but if I run Flink 1.7 in docker-compose (not in the swarm), it works I can do it work, deleting the hostname in docker-compose.yaml file version: "3" services: jobmanager17: image: flink:1.7.0-hadoop27-scala_2.11 hostname: "{{.Node.Hostname}}" ports: - "8081:8081" - "9254

Flink Xpack ElasticSearch 5 ElasticsearchSecurityException missing autentication

天涯浪子 提交于 2020-01-05 05:37:08
问题 Goodmorning everyone. I am trying to Flink connector Elasticsearch 5.2.1 and I have problems with the authentication XPACK 回答1: Using a different transport clients is currently (March 2017, Flink 1.2) not supported in Flink. However, I've filed a JIRA to add the feature: FLINK-6065 Make TransportClient for ES5 pluggable Until this has been implemented into Flink, I recommend overriding the ElasticsearchSink and using a different call bridge calling the PreBuiltXPackTransportClient . 来源: https

How to specify log file different from daemon log file while submitting a flink job in a standalone flink cluster

十年热恋 提交于 2020-01-05 05:10:28
问题 When I am starting a flink standalone cluster, It logs daemon logs in a file mentioned in conf/log4j.properties file, and when I submit a flink job in that cluster, it uses same properties file to log the application logs and write into same log file on taskmanagers. I want to have separate log files for my each application submitted in that flink standalone cluster. Is there any way to achieve that 回答1: When you submit the job using the ./bin/flink shell script, use the following environment

Can anyone share a Flink Kafka example in Scala?

谁说胖子不能爱 提交于 2020-01-03 16:47:09
问题 Can anyone share a working example of Flink Kafka (mainly receiving messages from Kafka) in Scala? I know there is a KafkaWordCount example in Spark. I just need to print out Kafka message in Flink. It would be really helpful. 回答1: The following code shows how to read from a Kafka topic using Flink's Scala DataStream API: import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082 import org.apache.flink.streaming.util.serialization

Apache Flink DataStream API doesn't have a mapPartition transformation

允我心安 提交于 2020-01-03 08:50:15
问题 Spark DStream has mapPartition API, while Flink DataStream API doesn't. Is there anyone who could help explain the reason. What I want to do is to implement a API similar to Spark reduceByKey on Flink. 回答1: Flink's stream processing model is quite different from Spark Streaming which is centered around mini batches. In Spark Streaming each mini batch is executed like a regular batch program on a finite set of data, whereas Flink DataStream programs continuously process records. In Flink's

What happen to state in Flink Task Manager when crash?

懵懂的女人 提交于 2020-01-03 05:23:13
问题 may i know what happen to state stored in Flink Task Manager when this Task manager crash. Say the state storage is rocksdb, would those data transfer to other running Task Manager so that complete state data is ready for data processing? 回答1: Flink does not (yet) support dynamic rescaling of state, so the failed task manager must be recovered, and the job will be restarted from a checkpoint. Exactly what that involves depends on how your cluster is configured, and whether the job failed

Flink: possible to delete Queryable state after X time?

廉价感情. 提交于 2020-01-03 02:56:07
问题 In my case, I use Flink's queryable state only. In particular, I do not care about checkpoints. Upon an event, I query the queryable state only after a maximum of X minutes. Ideally, I would delete the "old" state to save on space. That's why I wonder: can I signal Flink's state to clear itself after some time? Through configuration? Through specific event signals? How? 回答1: One way to clear state is to explicitly call clear() on the state object (e.g., a ValueState object) when you no longer