apache-flink

How to omit a null value exception in flink-kafka , Any help would do

别等时光非礼了梦想. 提交于 2019-12-01 12:08:35
I'm trying to make a code that creates alert when temperature is above threshold temperature (as defined in the code), but keyed stream is creating problem. I'm new to flink and intermediate in scala. I need help in this code I've tried almost everything def main(args: Array[String]): Unit = { val TEMPERATURE_THRESHOLD: Double = 50.00 val see: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val properties = new Properties() properties.setProperty("bootstrap.servers", "localhost:9092") properties.setProperty("zookeeper.connect", "localhost:2181") val src = see

Flink checkpoints to Google Cloud Storage

谁说胖子不能爱 提交于 2019-12-01 11:39:31
I am trying to configure checkpoints for flink jobs in GCS. Everything works fine if I run a test job locally (no docker and any cluster setup) but it fails with an error if I run it using docker-compose or cluster setup and deploy fat jar with jobs in flink dashboard. Any thoughts of it? Thanks! Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem

Throughput and Latency on Apache Flink

▼魔方 西西 提交于 2019-12-01 11:18:53
I have written a very simple java program for Apache Flink and now I am interested in measuring statistics such as throughput (number of tuples processed per second) and latency (the time the program needs to process every input tuple). StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.readTextFile("/home/LizardKing/Documents/Power/Prova.csv") .map(new MyMapper().writeAsCsv("/home/LizardKing/Results.csv"); JobExecutionResult res = env.execute(); I know that Flink exposes some metrics: https://ci.apache.org/projects/flink/flink-docs-release-1.2

Why flink container vcore size is always 1

喜欢而已 提交于 2019-12-01 11:02:51
问题 I am running flink on yarn(more precisely in AWS EMR yarn cluster). I read flink document and source code that by default for each task manager container, flink will request the number of slot per task manager as the number of vcores when request resource from yarn. And I also confirmed from the source code: // Resource requirements for worker containers int taskManagerSlots = taskManagerParameters.numSlots(); int vcores = config.getInteger(ConfigConstants.YARN_VCORES, Math.max

Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

孤街浪徒 提交于 2019-12-01 10:36:22
I am trying out Flink's new Python streaming API and attempting to run my script with ./flink-1.6.1/bin/pyflink-stream.sh examples/read_from_kafka.py . The python script is fairly straightforward, I am just trying to consume from an existing topic and send everything to stdout (or the *.out file in the log directory where the output method emits data by default). import glob import os import sys from java.util import Properties from org.apache.flink.streaming.api.functions.source import SourceFunction from org.apache.flink.streaming.api.collector.selector import OutputSelector from org.apache

Apache Flink: How to count the total number of events in a DataStream

可紊 提交于 2019-12-01 10:18:36
问题 I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. I am doing this by using map on joinedEventDataStream as shown below joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() { @Override public Object map(JoinedEvent joinedEvent) throws Exception { number_of_joined_events += 1; return null; } }); Question # 1: Is this the appropriate way to count the number of

Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

大城市里の小女人 提交于 2019-12-01 09:25:20
问题 I am trying out Flink's new Python streaming API and attempting to run my script with ./flink-1.6.1/bin/pyflink-stream.sh examples/read_from_kafka.py . The python script is fairly straightforward, I am just trying to consume from an existing topic and send everything to stdout (or the *.out file in the log directory where the output method emits data by default). import glob import os import sys from java.util import Properties from org.apache.flink.streaming.api.functions.source import

Throughput and Latency on Apache Flink

半城伤御伤魂 提交于 2019-12-01 07:27:52
问题 I have written a very simple java program for Apache Flink and now I am interested in measuring statistics such as throughput (number of tuples processed per second) and latency (the time the program needs to process every input tuple). StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.readTextFile("/home/LizardKing/Documents/Power/Prova.csv") .map(new MyMapper().writeAsCsv("/home/LizardKing/Results.csv"); JobExecutionResult res = env.execute(); I know

Local Flink config running standalone from IDE

允我心安 提交于 2019-12-01 06:24:44
If I'd like to run a Flink app locally, directly from within Intellij but I need to specify config params (like fs.hdfs.hdfssite to set up S3 access), is there any other way to provide those config params apart from ExecutionEnvironment.createLocalEnvironment(conf) ? What if I want to use StreamExecutionEnvironment.getExecutionEnvironment ? Can I have a Flink config in my project and point the local app to it? Is this the proper way to do it? Or would you set up your IDE to submit the app to a real local Flink instance? To create a StreamExecutionEnvironment with configuration options, use

What is the difference between a “stateful” and “stateless” system?

a 夏天 提交于 2019-12-01 06:23:51
问题 Apache Spark brags that its operators (nodes) are "stateless". This allows Spark's architecture to use simpler protocols for things like recovery, load balancing, and handling stragglers. On the other hand Apache Flink describes its operators as "stateful", and claim that statefulness is necessary for applications like machine learning. Yet Spark programs are able to pass forward information and maintain application data in RDDs without maintaining "state". What is happening here? Is Spark