apache-flink

How can I create an External Catalog Table in Apache Flink

本小妞迷上赌 提交于 2019-12-13 00:56:41
问题 I tried to create and ExternalCatalog to use in Apache Flink Table. I created and added to the Flink table environment (here the official documentation). For some reason, the only external table present in the 'catalog', it is not found during the scan. What I missed in the code above? val catalogName = s"externalCatalog$fileNumber" val ec: ExternalCatalog = getExternalCatalog(catalogName, 1, tableEnv) tableEnv.registerExternalCatalog(catalogName, ec) val s1: Table = tableEnv.scan("S_EXT")

Duplicate files copied in APK reference.conf

隐身守侯 提交于 2019-12-13 00:19:32
问题 I want to use my Android App as a “Producing client” for Kafka. After adding following dependecies: // https://mvnrepository.com/artifact/org.apache.flink/flink-java compile group: 'org.apache.flink', name: 'flink-java', version: '1.1.3' // https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java_2.10 compile group: 'org.apache.flink', name: 'flink-streaming-java_2.10', version: '1.1.3' // https://mvnrepository.com/artifact/org.apache.flink/flink-clients_2.10 compile group:

Read data from Redis to Flink

帅比萌擦擦* 提交于 2019-12-12 23:16:04
问题 I have been trying to find a connector to read data from Redis to Flink. Flink's documentation contains the description for a connector to write to Redis. I need to read data from Redis in my Flink job. In Using Apache Flink for data streaming, Fabian has mentioned that it is possible to read data from Redis. What is the connector that can be used for the purpose? 回答1: We are running one in production that looks roughly like this class RedisSource extends RichSourceFunction[SomeDataType] {

Apache Flink - enable join ordering

僤鯓⒐⒋嵵緔 提交于 2019-12-12 19:24:43
问题 I have noticed that Apache Flink does not optimise the order in which the tables are joined. At the moment, it keeps the user-specified join order (basically, it takes the the query literally). I suppose that Apache Calcite can optimise the order of joins but for some reason these rules are not in use in Apache Flink. If, for example, we have two tables ' R ' and ' S ' private val tableEnv: BatchTableEnvironment = TableEnvironment.getTableEnvironment(env) private val fileNumber = 1 tableEnv

Enriching DataStream using static DataSet in Flink streaming

ⅰ亾dé卋堺 提交于 2019-12-12 19:06:08
问题 I am writing a Flink streaming program in which I need to enrich a DataStream of user events using some static data set (information base, IB). For E.g. Let's say we have a static data set of buyers and we have an incoming clickstream of events, for each event we want to add a boolean flag indicating whether the doer of the event is a buyer or not. An ideal way to achieve this would be to partition the incoming stream by user id, have the buyers set available in a DataSet partitioned again by

What's the difference between a watermark and a trigger in Flink?

你。 提交于 2019-12-12 18:27:27
问题 I read that, "..The ordering operator has to buffer all elements it receives. Then, when it receives a watermark it can sort all elements that have a timestamp that is lower than the watermark and emit them in the sorted order. This is correct because the watermark signals that not more elements can arrive that would be intermixed with the sorted elements..." - https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams Hence, it seems that the watermark serves as a signal to

Consume GCS files based on pattern from Flink

空扰寡人 提交于 2019-12-12 18:24:29
问题 Since Flink supports the Hadoop FileSystem abstraction, and there's a GCS connector - library that implements it on top of Google Cloud Storage. How do I create a Flink file source using the code in this repo? 回答1: To achieve this you need to: Install and configure GCS connector on your Flink cluster. Add Hadoop and Flink dependencies (including HDFS connector) to your project: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-scala_2.11</artifactId> <version>${flink.version}

Flink exactly-once message processing

前提是你 提交于 2019-12-12 18:17:27
问题 I've setup a Flink 1.2 standalone cluster with 2 JobManagers and 3 TaskManagers and I'm using JMeter to load-test it by producing Kafka messages / events which are then processed. The processing job runs on a TaskManager and it usually takes ~15K events/s. The job has set EXACTLY_ONCE checkpointing and is persisting state and checkpoints to Amazon S3. If I shutdown the TaskManager running the job it takes a bit, a few seconds, then the job is resumed on a different TaskManager. The job mainly

Apache Flink integration with Elasticsearch

巧了我就是萌 提交于 2019-12-12 18:05:56
问题 I am trying to integrate Flink with Elasticsearch 2.1.1, I am using the maven dependency <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch2_2.10</artifactId> <version>1.1-SNAPSHOT</version> </dependency> and here's the Java Code where I am reading the events from a Kafka queue (which works fine) but somehow the events are not getting posted in the Elasticsearch and there is no error either, in the below code if I change any of the settings related to

flink calculate median on stream

六月ゝ 毕业季﹏ 提交于 2019-12-12 17:36:29
问题 I'm required to calculate median of many parameters received from a kafka stream for 15 min time window. i couldn't find any built in function for that, but I have found a way using custom WindowFunction. my questions are: is it a difficult task for flink? the data can be very large. if the data gets to giga bytes, will flink store everything in memory until the end of the time window? (one of the arguments of apply WindowFunction implementation is Iterable - a collection of all data which