apache-flink | 易学教程

IOExcpetion while connecting to Twitter Streaming API with Apache Flink

阅读更多关于 IOExcpetion while connecting to Twitter Streaming API with Apache Flink

问题 I wrote a small Scala program which uses the Apache Flink Streaming API to read Twitter tweets. object TwitterWordCount { private val properties = "/home/twitter-login.properties" def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment val twitterStream = env.addSource(new TwitterSource(properties)) val tweets = twitterStream .flatMap(new JSONParseFlatMap[String, String] { override def flatMap(in: String, out: Collector[String]): Unit = { if (getString(in,

How to build and use flink-connector-kinesis?

阅读更多关于 How to build and use flink-connector-kinesis?

问题 I'm trying to use Apache Flink with AWS kinesis. The document says that I have to build the connector on my own. Therefore, I build the connector and added the jar file for my project and also, I put the dependency on my pom.xml file. <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kinesis_2.11</artifactId> <version>1.6.1</version> </dependency> However, when I tried to build using mvn clean package I got an error message like this [INFO] -----------------------<

Flink dynamic scaling

阅读更多关于 Flink dynamic scaling

问题 I am currently studying scalability on Flink. Starting from Version 1.2.0, dynamic rescaling was introduced. I am looking at scaling a long running job which reads data from Kafka source. Questions regarding dynamic rescaling. To scale out my flink application, for example: add new task managers, must I restart the job / yarn session to use the newly added resource? I think it's possible to write Yarn client to deploy new task managers and make it talk to job manager, is that already

Apache Flink: ProcessWindowFunction implementation

阅读更多关于 Apache Flink: ProcessWindowFunction implementation

问题 I am trying to use a ProcessWindowFunction in my Apache Flink project using Scala. Unfortunately, I already fail at implementing a basic ProcessWindowFunction like it is used in the Apache Flink Documentation. This is my code: import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, _} import org.apache.flink.streaming.api.windowing.time.Time import org.fiware.cosmos.orion.flink.connector.{NgsiEvent, OrionSource} import org.apache

zipWithIndex on Apache Flink

阅读更多关于 zipWithIndex on Apache Flink

问题 I'd like to assign each row of my input an id - which should be a number from 0 to N - 1 , where N is the number of rows in the input. Roughly, I'd like to be able to do something like the following : val data = sc.textFile(textFilePath, numPartitions) val rdd = data.map(line => process(line)) val rddMatrixLike = rdd.zipWithIndex.map { case (v, idx) => someStuffWithIndex(idx, v) } But in Apache Flink. Is it possible? 回答1: This is now a part of the 0.10-SNAPSHOT release of Apache Flink.

Apache Flink: NullPointerException caused by TupleSerializer

阅读更多关于 Apache Flink: NullPointerException caused by TupleSerializer

问题 When I execute my Flink application it gives me this NullPointerException : 2017-08-08 13:21:57,690 INFO com.datastax.driver.core.Cluster - New Cassandra host /127.0.0.1:9042 added 2017-08-08 13:22:02,427 INFO org.apache.flink.runtime.taskmanager.Task - TriggerWindow(TumblingEventTimeWindows(30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@15d1c80b}, EventTimeTrigger(), WindowedStream.apply(CoGroupedStreams.java:302)) -> Filter -> Flat Map ->

Measure job execution time in flink

阅读更多关于 Measure job execution time in flink

问题 Is there any way to measure job execution time in Apache Flink when submitting the job to flink using command line? PS. I want the flink API to give me the time rather than measuring it myself in bash by noting the start and end times 回答1: The ExecutionEnvironment.execute() method returns a JobExecutionResult object containing the job runtime. You could for example do something like this: // execute program JobExecutionResult result = env.execute("My Flink Job"); System.out.println("The job

Degree of parallelism in Apache Flink

阅读更多关于 Degree of parallelism in Apache Flink

问题 Can I set different degree of parallelism for different part of the task in our program in Flink? For instance, how does Flink interpret the following sample code? The two custom practitioners MyPartitioner1, MyPartitioner2, partition the input data two 4 and 2 partitions. partitionedData1 = inputData1 .partitionCustom(new MyPartitioner1(), 1); env.setParallelism(4); DataSet<Tuple2<Integer, Integer>> output1 = partitionedData1 .mapPartition(new calculateFun()); partitionedData2 = inputData2

Flink Kafka - how to make App run in Parallel?

阅读更多关于 Flink Kafka - how to make App run in Parallel?

问题 I am creating a app in Flink to Read Messages from a topic Do some simple process on it Write Result to a different topic My code does work , however it does not run in parallel How do I do that? It seems my code runs only on one thread/block? On the Flink Web Dashboard: App goes to running status But, there is only one block shown in the overview subtasks And Bytes Received / Sent, Records Received / Sent is always zero ( no Update ) Here is my code, please assist me in learning how to split

How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

阅读更多关于 How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

问题 I have one questions about running Flink streaming job in IDE or as fat jar without deploying it to Flink server. The problem is I cannot run it in IDE when I have more than 1 taskslot in my job. public class StreamingJob { public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties kafkaProperties = new Properties(); kafkaProperties.setProperty(