apache-flink | 易学教程

Can Flink write results into multiple files (like Hadoop's MultipleOutputFormat)?

阅读更多关于 Can Flink write results into multiple files (like Hadoop's MultipleOutputFormat)?

问题 I'm using Apache Flink's DataSet API. I want to implement a job that writes multiple results into different files. How can I do that? 回答1: You can add as many data sinks to a DataSet program as you need. For example in a program like this: ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Tuple3<String, Long, Long>> data = env.readFromCsv(...); // apply MapFunction and emit data.map(new YourMapper()).writeToText("/foo/bar"); // apply FilterFunction and emit

Flink - how to solve error This job is not stoppable

阅读更多关于 Flink - how to solve error This job is not stoppable

问题 I tried to stop a job through flink stop flink stop [jobid] However the CLI throws error and does not allow me to stop the job. I could cancel it. What could be the reason here? Stopping job c7196bb1d21d679efed73770a4e4f9ed. ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop the job c7196bb1d21d679efed73770a4e4f9ed. at org.apache.flink.client.cli.CliFrontend.lambda$stop$5

Flink - how to solve error This job is not stoppable

阅读更多关于 Flink - how to solve error This job is not stoppable

Using Flink LocalEnvironment for Production

阅读更多关于 Using Flink LocalEnvironment for Production

问题 I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ? Appreciate any help/insight. Thanks 回答1: LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure

Flink - ElasticSearch Sink - error handling

阅读更多关于 Flink - ElasticSearch Sink - error handling

问题 I am trying to follow this Flink guide [1] to handle errors in ElasticSearchSink by re-adding the failed messages to the queue. The error scenarios that I got and going to retry are: (i) conflict in UpdateRequest document version and (ii) lost connection to ElasticSearch. These errors are expected to be non-persistent, would be solved by (i) changing the version / (ii) gone after some seconds What I expect is message got retried successfully. What I actually got was: Flink seemed to get stuck

Error About Deployment of Flink on Yarn

阅读更多关于 Error About Deployment of Flink on Yarn

问题 I tried to deploy Flink on Yarn, but failed. It seemed that Yarn could not launch container. Anyone kowns this problem? Any suggestion would be appreciated. When i start Flink like this: [admin@bufer108072.tbc ~/flink-0.10-SNAPSHOT]$ bin/yarn-session.sh -n 4 I get the following console print out: 09:16:35,069 INFO org.apache.flink.yarn.FlinkYarnCluster - Start application client. Flink JobManager is now running on bufer108132.tbc:34408 JobManager Web Interface: http://bufer108072.tbc.tbsite

Can we combine both and count and process time Trigger in Flink?

阅读更多关于 Can we combine both and count and process time Trigger in Flink?

问题 I want to make the Windows completed after the count reached 100 or every 5 seconds for the tumbling process time ? That is to say when the elements reached 100, trigger the Windows computation, however if the elements don't reache 100, but the time elapsed 5 seconds, it also trigger the Windows computation, just as the combination of the below two triggers: .countWindow(100) .window(TumblingProcessingTimeWindows.of(Time.seconds(5))) 回答1: There's no super simple way to do this with the

Controlled/manual error/recovery handling in stream-based applications

阅读更多关于 Controlled/manual error/recovery handling in stream-based applications

问题 I am working on an application based on Apache Flink , which makes use of Apache Kafka for input and out. Possibly this application will be ported to Apache Spark , so I have added this as a tag as well, and the question remains the same. I have the requirement that all incoming messages received via kafka must be processed in-order, as well safely be stored in a persistence layer (database), and no message must get lost. The streaming-part in this application is rather trivial/small, as the

Controlled/manual error/recovery handling in stream-based applications

阅读更多关于 Controlled/manual error/recovery handling in stream-based applications

Kafka consuming the latest message again when I rerun the Flink consumer

阅读更多关于 Kafka consuming the latest message again when I rerun the Flink consumer

问题 I have created a Kafka consumer in Apache Flink API written in Scala. Whenever I pass some messages from a topic, it duly is receiving them. However, when I restart the consumer, instead of receiving the new or unconsumed messages, it consumes the latest message that was sent to that topic. Here's what I am doing: Running the producer: $ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic corr2 Running the consumer: val properties = new Properties() properties.setProperty(