apache-flink

how to handle execution timeout in flink

我的梦境 提交于 2019-12-08 07:24:27
问题 Connected to JobManager at Actor[akka.tcp://flink@localhost:6123/user/jobmanager#-1119198862] with leader session id 00000000-0000-0000-0000-000000000000. org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Couldn't retrieve the JobExecutionResult from the JobManager. at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:478) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:105) at org

Simple Scala API for CEP example don't show any output

旧街凉风 提交于 2019-12-08 05:25:05
问题 I'm programming a simple example for testing the new Scala API for CEP in Flink, using the latest Github version for 1.1-SNAPSHOT. The Pattern is only a check for a value, and outputs a single String as a result for each pattern matched. Code is as follows: val pattern : Pattern[(String, Long, Int), _] = Pattern.begin("start").where(_._3 < 4) val cepEventAlert = CEP.pattern(streamingAlert, pattern) def selectFn(pattern : mutable.Map[String, (String, Long, Int)]): String = { val startEvent =

Apache Flink Using Windows to induce a delay before writing to Sink

廉价感情. 提交于 2019-12-08 05:13:17
问题 I am wondering is possible with Flink windowing to induce a 10 minute delay from when the data enters the pipeline until it is written to a table in Cassandra. My initial intention was to write each transaction to a table in Cassandra and query the table using a range key at the web layer but due to the volume of data, I am looking at options to delay the write for N seconds. This means that my table will only ever have data that is at least 10 minutes old. The small diagram below shows 10

AWS SDK conflicts in Apache Flink enviromnent

若如初见. 提交于 2019-12-08 04:03:44
问题 I'm trying to deploy my job to Flink environment, and always get an error: java.lang.NoSuchMethodError: com.amazonaws.AmazonWebServiceRequest.putCustomQueryParameter(Ljava/lang/String;Ljava/lang/String;) I've tried to include/exclude aws-sdk from my jar, but it didn't help. Does anyone know how to resolve these conflicts ? 回答1: Apache Flink loads many classes by default into its classpath. And your problem is just with versions conflict. Please read the last section of this article https://ci

Apache Flink Streaming window WordCount

扶醉桌前 提交于 2019-12-08 03:30:54
问题 I have following code to count words from socketTextStream. Both cumulate word counts and time windowed word counts are needed. The program has an issue that cumulateCounts is always the same as windowed counts. Why this issue occurs? What is the correct way to calculate cumulate counts base on windowed counts? StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); final HashMap<String, Integer> cumulateCounts = new HashMap<String, Integer>(); final DataStream

Apache Flink - how to send and consume POJOs using AWS Kinesis

那年仲夏 提交于 2019-12-08 02:51:51
问题 I want to consume POJOs arriving from Kinesis with Flink. Is there any standard for how to correctly send and deserialize the messages? Thanks 回答1: I resolved it with: DataStream<SamplePojo> kinesis = see.addSource(new FlinkKinesisConsumer<>( "my-stream", new POJODeserializationSchema(), kinesisConsumerConfig)); and public class POJODeserializationSchema extends AbstractDeserializationSchema<SamplePojo> { private ObjectMapper mapper; @Override public SamplePojo deserialize(byte[] message)

FLINK: How to read from multiple kafka cluster using same StreamExecutionEnvironment

半世苍凉 提交于 2019-12-08 02:46:20
问题 I want to read data from multiple KAFKA clusters in FLINK. But the result is that the kafkaMessageStream is reading only from first Kafka. I am able to read from both Kafka clusters only if i have 2 streams separately for both Kafka , which is not what i want. Is it possible to have multiple sources attached to single reader. sample code public class KafkaReader<T> implements Reader<T>{ private StreamExecutionEnvironment executionEnvironment ; public StreamExecutionEnvironment

Python + Beam + Flink

匆匆过客 提交于 2019-12-08 00:40:47
问题 I've been trying to get the Apache Beam Portability Framework to work with Python and Apache Flink and I can't seem to find a complete set of instructions to get the environment working. Are there any references with complete list of prerequisites and steps to get a simple python pipeline working? 回答1: Overall, for local portable runner (ULR), see the wiki, quote from there: Run a Python-SDK Pipeline: Compile container as a local build: ./gradlew :beam-sdks-python-container:docker Start ULR

Flink program cannot submit when i follow flink-1.4's quickstart and use “./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000”

拥有回忆 提交于 2019-12-07 19:26:51
问题 Flink-1.4 quickstart address: https://ci.apache.org/projects/flink/flink-docs-release-1.4/quickstart/setup_quickstart.html. When I use "./bin/start-local.sh" to start flink following flink-1.4's quickstart, then i check http://localhost:8081/ and make sure everything is running, then i use "./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000" to submit .jar and i got following info, and i can't submit successfully. ---------------------------------------------------------

how to handle execution timeout in flink

≡放荡痞女 提交于 2019-12-07 15:58:32
Connected to JobManager at Actor[akka.tcp://flink@localhost:6123/user/jobmanager#-1119198862] with leader session id 00000000-0000-0000-0000-000000000000. org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Couldn't retrieve the JobExecutionResult from the JobManager. at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:478) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:105) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442) at org.apache.flink.client.program