apache-flink

Confused about FLINK task slot

廉价感情. 提交于 2019-12-12 11:07:25
问题 I know a task manager could have several task slots. But, what is a task slot ? A JVM process or an object in memory or a thread? 回答1: The answer might come late. But: A Taskmanager (TM) is a JVM process, whereas a Taskslot (TS) is a Thread within the respective JVM process (TM). The managed memory of a TM is equally split up between the TS within a TM. No CPU isolation happens between the slots, just the managed memory is divided. Moreover, TS in the same TM share TCP connections (via

ClassNotFoundException: org.apache.flink.streaming.api.checkpoint.CheckpointNotifier while consuming a kafka topic

强颜欢笑 提交于 2019-12-12 10:56:03
问题 I am using the latest Flink-1.1.2-Hadoop-27 and flink-connector-kafka-0.10.2-hadoop1 jars. Flink consumer is as below: StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment(); if (properties == null) { properties = new Properties(); InputStream props = Resources.getResource(KAFKA_CONFIGURATION_FILE).openStream(); properties.load(props); DataStream<String> stream = env.addSource(new FlinkKafkaConsumer082<>(KAFKA_SIP_TOPIC, new SimpleStringSchema() , properties));

How to stop a flink streaming job from program

让人想犯罪 __ 提交于 2019-12-12 08:36:41
问题 I am trying to create a JUnit test for a Flink streaming job which writes data to a kafka topic and read data from the same kafka topic using FlinkKafkaProducer09 and FlinkKafkaConsumer09 respectively. I am passing a test data in the produce: DataStream<String> stream = env.fromElements("tom", "jerry", "bill"); And checking whether same data is coming from the consumer as: List<String> expected = Arrays.asList("tom", "jerry", "bill"); List<String> result = resultSink.getResult(); assertEquals

Flink thowing serialization error when reading from hbase

非 Y 不嫁゛ 提交于 2019-12-12 05:08:58
问题 When I read from hbase using richfatMapFunction inside a map I am getting serialization error. What I am trying to do is if a datastream equals to a particular string read from hbase else ignore. Below is the sample program and error I am getting. package com.abb.Flinktest import java.text.SimpleDateFormat import java.util.Properties import scala.collection.concurrent.TrieMap import org.apache.flink.addons.hbase.TableInputFormat import org.apache.flink.api.common.functions.RichFlatMapFunction

Flink with Kafka Consumer doesn't work

大城市里の小女人 提交于 2019-12-12 05:08:30
问题 I want to benchmark Spark vs Flink, for this purpose I am making several tests. However Flink doesn't work with Kafka, meanwhile with Spark works perfect. The code is very simple: val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val properties = new Properties() properties.setProperty("bootstrap.servers", "localhost:9092") properties.setProperty("group.id", "myGroup") println("topic: "+args(0)) val stream = env.addSource(new FlinkKafkaConsumer09[String]

Apache Flink: Correctly make async webservice calls within MapReduce()

我与影子孤独终老i 提交于 2019-12-12 05:05:04
问题 I've a program with the following mapPartition function: public void mapPartition(Iterable<Tuple> values, Collector<Tuple2<Integer, String>> out) I collect batches of 100 from the inputted values & send them to a web-service for conversion. The result I add back to the out collection. In order to speed up the process, I made the web-service calls async through the use of Executors . This created issues, either I get the taskManager released exception, or AskTimeoutException . I increased

How can I inspect the internal flink timestamp for an item in a stream using the Processing Time Model?

杀马特。学长 韩版系。学妹 提交于 2019-12-12 04:41:14
问题 I am looking to tag the data in my stream with the time it arrived in flink so that I can perform some calculations. I recognize when using the Event Time Model I would have direct control over that, but I was hoping there was some easy way to discover the Timestamp flink was using when making Window decisions on a stream. 回答1: Flink supports three modes to work with time: Processing Time : Events are processed with respect to the current time of each operator Event Time : Events are

How to access/read kafka topic data from flink?

前提是你 提交于 2019-12-12 04:36:26
问题 I am trying to read kafka data from flink and as I am new to kafka and flink, I don't know how to connect them. 回答1: Flink provides Kafka connector. In order read data from Kafka topics, first you need add Flink -Kafka connector dependency. <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka-0.8_2.10</artifactId> <version>1.1.3</version> </dependency> Next you simply invoke Streaming execution environment and add Kafka source. Here is a sample Properties

Flink state backend for TaskManager

三世轮回 提交于 2019-12-12 03:33:40
问题 I have a Flink v1.2 setup with 1 JobManager, 2 TaskManagers each in it's own VM. I configured the state backend to filesystem and pointed it to a local location in the case of each of the above hosts (state.backend.fs.checkpointdir: file:///home/ubuntu/Prototype/flink/flink-checkpoints). I have set parallelism to 1 and each taskanager has 1 slot. I then run an event processing job on the JobManager which assigns it to a TaskManager. I kill the TaskManager running the job and after a few

How to do kerberos authentication on a flink standalone installation?

十年热恋 提交于 2019-12-12 03:26:17
问题 I have a standalone Flink installation on top of which I want to run a streaming job that is writing data into a HDFS installation. The HDFS installation is part of a Cloudera deployment and requires Kerberos authentication in order to read and write the HDFS. Since I found no documentation on how to make Flink connect with a Kerberos-protected HDFS I had to make some educated guesses about the procedure. Here is what I did so far: I created a keytab file for my user. In my Flink job, I added