apache-flink | 易学教程

How to omit a null value exception in flink-kafka , Any help would do

阅读更多关于 How to omit a null value exception in flink-kafka , Any help would do

I'm trying to make a code that creates alert when temperature is above threshold temperature (as defined in the code), but keyed stream is creating problem. I'm new to flink and intermediate in scala. I need help in this code I've tried almost everything def main(args: Array[String]): Unit = { val TEMPERATURE_THRESHOLD: Double = 50.00 val see: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val properties = new Properties() properties.setProperty("bootstrap.servers", "localhost:9092") properties.setProperty("zookeeper.connect", "localhost:2181") val src = see

Flink checkpoints to Google Cloud Storage

阅读更多关于 Flink checkpoints to Google Cloud Storage

I am trying to configure checkpoints for flink jobs in GCS. Everything works fine if I run a test job locally (no docker and any cluster setup) but it fails with an error if I run it using docker-compose or cluster setup and deploy fat jar with jobs in flink dashboard. Any thoughts of it? Thanks! Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem

Throughput and Latency on Apache Flink

阅读更多关于 Throughput and Latency on Apache Flink

I have written a very simple java program for Apache Flink and now I am interested in measuring statistics such as throughput (number of tuples processed per second) and latency (the time the program needs to process every input tuple). StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.readTextFile("/home/LizardKing/Documents/Power/Prova.csv") .map(new MyMapper().writeAsCsv("/home/LizardKing/Results.csv"); JobExecutionResult res = env.execute(); I know that Flink exposes some metrics: https://ci.apache.org/projects/flink/flink-docs-release-1.2

Why flink container vcore size is always 1

阅读更多关于 Why flink container vcore size is always 1

问题 I am running flink on yarn(more precisely in AWS EMR yarn cluster). I read flink document and source code that by default for each task manager container, flink will request the number of slot per task manager as the number of vcores when request resource from yarn. And I also confirmed from the source code: // Resource requirements for worker containers int taskManagerSlots = taskManagerParameters.numSlots(); int vcores = config.getInteger(ConfigConstants.YARN_VCORES, Math.max

Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

阅读更多关于 Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

I am trying out Flink's new Python streaming API and attempting to run my script with ./flink-1.6.1/bin/pyflink-stream.sh examples/read_from_kafka.py . The python script is fairly straightforward, I am just trying to consume from an existing topic and send everything to stdout (or the *.out file in the log directory where the output method emits data by default). import glob import os import sys from java.util import Properties from org.apache.flink.streaming.api.functions.source import SourceFunction from org.apache.flink.streaming.api.collector.selector import OutputSelector from org.apache

Apache Flink: How to count the total number of events in a DataStream

阅读更多关于 Apache Flink: How to count the total number of events in a DataStream

问题 I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. I am doing this by using map on joinedEventDataStream as shown below joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() { @Override public Object map(JoinedEvent joinedEvent) throws Exception { number_of_joined_events += 1; return null; } }); Question # 1: Is this the appropriate way to count the number of

Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

阅读更多关于 Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

问题 I am trying out Flink's new Python streaming API and attempting to run my script with ./flink-1.6.1/bin/pyflink-stream.sh examples/read_from_kafka.py . The python script is fairly straightforward, I am just trying to consume from an existing topic and send everything to stdout (or the *.out file in the log directory where the output method emits data by default). import glob import os import sys from java.util import Properties from org.apache.flink.streaming.api.functions.source import

Throughput and Latency on Apache Flink

阅读更多关于 Throughput and Latency on Apache Flink

问题 I have written a very simple java program for Apache Flink and now I am interested in measuring statistics such as throughput (number of tuples processed per second) and latency (the time the program needs to process every input tuple). StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.readTextFile("/home/LizardKing/Documents/Power/Prova.csv") .map(new MyMapper().writeAsCsv("/home/LizardKing/Results.csv"); JobExecutionResult res = env.execute(); I know

Local Flink config running standalone from IDE

阅读更多关于 Local Flink config running standalone from IDE

If I'd like to run a Flink app locally, directly from within Intellij but I need to specify config params (like fs.hdfs.hdfssite to set up S3 access), is there any other way to provide those config params apart from ExecutionEnvironment.createLocalEnvironment(conf) ? What if I want to use StreamExecutionEnvironment.getExecutionEnvironment ? Can I have a Flink config in my project and point the local app to it? Is this the proper way to do it? Or would you set up your IDE to submit the app to a real local Flink instance? To create a StreamExecutionEnvironment with configuration options, use

What is the difference between a “stateful” and “stateless” system?

阅读更多关于 What is the difference between a “stateful” and “stateless” system?

问题 Apache Spark brags that its operators (nodes) are "stateless". This allows Spark's architecture to use simpler protocols for things like recovery, load balancing, and handling stragglers. On the other hand Apache Flink describes its operators as "stateful", and claim that statefulness is necessary for applications like machine learning. Yet Spark programs are able to pass forward information and maintain application data in RDDs without maintaining "state". What is happening here? Is Spark