apache-flink | 易学教程

External checkpoints to S3 on EMR

阅读更多关于 External checkpoints to S3 on EMR

问题 I am trying to deploy a production cluster for my Flink program. I am using a standard hadoop-core EMR cluster with Flink 1.3.2 installed, using YARN to run it. I am trying to configure my RocksDB to write my checkpoints to an S3 bucket. I am trying to go through these docs: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#set-s3-filesystem. The problem seems to be getting the dependencies working correctly. I receive this error when trying run the program: java.lang

Flink TaskManagers do not start until job is submitted in YARN cluster

阅读更多关于 Flink TaskManagers do not start until job is submitted in YARN cluster

问题 I am using Amazon EMR to run Flink Cluster on YARN. My setup consists of m4.large instances for 1 master and 2 core nodes. I have started the Flink CLuster on YARN with the command: flink-yarn-session -n 2 -d -tm 4096 -s 4 . Flink Job Manager and Application Manager starts but there are no Task Managers running. The Flink Web interface shows 0 for task managers, task slots and slots available. However when I submit a job to flink cluster, then Task Managers get allocated and the job runs and

Fllink Web UI not displaying records received in a Custom Source implementation

阅读更多关于 Fllink Web UI not displaying records received in a Custom Source implementation

问题 I have build a custom source to process a log stream in Flink. The program is running fine and giving me the desired results after processing the records. But, when I check the Web UI, I do not see the counts. Below is the screenshot: Records/Bytes Count 回答1: Flink chained all the operators of your pipeline into one operator: Source -> FlatMap -> ProcessLog -> Sink . Thus, this single operator contains the source and the sink. Additionally, Flink can neither measure the amount of bytes read

flink - using dagger injections - not serializable?

阅读更多关于 flink - using dagger injections - not serializable?

问题 Im using Flink (latest via git) to stream from kafka to cassandra. To ease unit testing Im adding dependency injection via Dagger. The ObjectGraph seems to be setting itself up properly but the 'inner objects' are being flagged as 'not serializable' by Flink. If I include these objects directly they work - so what's the difference? Class in question implements MapFunction and @Inject a module for cassandra and one for reading config files. Is there a way to build this so I can use late

Flink latency metrics not being shown

阅读更多关于 Flink latency metrics not being shown

问题 While running Flink 1.5.0 with a local environment I was trying to get latency metrics via REST (with something similar to http://localhost:8081/jobs/e779dbbed0bfb25cd02348a2317dc8f1/vertices/e70bbd798b564e0a50e10e343f1ac56b/metrics ) but there isn't any reference to latency . All of this while the latency tracking is enabled which I confirmed by checking with the debugger that the LatencyMarksEmitter is emiting the marks. What can I be doing wrong? 回答1: In 1.5 latency metrics aren't exposed

How to sort an out-of-order event time stream using Flink

阅读更多关于 How to sort an out-of-order event time stream using Flink

问题 This question covers how to sort an out-of-order stream using Flink SQL, but I would rather use the DataStream API. One solution is to do this with a ProcessFunction that uses a PriorityQueue to buffer events until the watermark indicates they are no longer out-of-order, but this performs poorly with the RocksDB state backend (the problem is that each access to the PriorityQueue will require ser/de of the entire PriorityQueue). How can I do this efficiently regardless of which state backend

Difference between job, task and subtask in flink

阅读更多关于 Difference between job, task and subtask in flink

问题 I'm new to flink and try to understand: job task subtask I searched in the docs but still did not get it. What's the main diffence between them? 回答1: Tasks and sub-tasks are explained here -- https://ci.apache.org/projects/flink/flink-docs-release-1.7/concepts/runtime.html#tasks-and-operator-chains: A task is an abstraction representing a chain of operators that could be executed in a single thread. Something like a keyBy (which causes a network shuffle to partition the stream by some key) or

Flink checkpoints to Google Cloud Storage

阅读更多关于 Flink checkpoints to Google Cloud Storage

问题 I am trying to configure checkpoints for flink jobs in GCS. Everything works fine if I run a test job locally (no docker and any cluster setup) but it fails with an error if I run it using docker-compose or cluster setup and deploy fat jar with jobs in flink dashboard. Any thoughts of it? Thanks! Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file

Apache flink on Kubernetes - Resume job if jobmanager crashes

阅读更多关于 Apache flink on Kubernetes - Resume job if jobmanager crashes

问题 I want to run a flink job on kubernetes, using a (persistent) state backend it seems like crashing taskmanagers are no issue as they can ask the jobmanager which checkpoint they need to recover from, if I understand correctly. A crashing jobmanager seems to be a bit more difficult. On this flip-6 page I read zookeeper is needed to be able to know what checkpoint the jobmanager needs to use to recover and for leader election. Seeing as kubernetes will restart the jobmanager whenever it crashes

Flink on Yarn, parallel source with Kafka

阅读更多关于 Flink on Yarn, parallel source with Kafka

问题 I am trying to have parallelism with my Kafka source within my Flink job, but I failed so far. I set 4 partitions to my Kafka producer : $ ./bin/kafka-topics.sh --describe --zookeeper X.X.X.X:2181 --topic mytopic Topic:mytopic PartitionCount:4 ReplicationFactor:1 Configs: Topic: mytopic Partition: 0 Leader: 0 Replicas: 0 Isr: 0 Topic: mytopic Partition: 1 Leader: 0 Replicas: 0 Isr: 0 Topic: mytopic Partition: 2 Leader: 0 Replicas: 0 Isr: 0 Topic: mytopic Partition: 3 Leader: 0 Replicas: 0 Isr