apache-flink | 易学教程

How can I create an External Catalog Table in Apache Flink

阅读更多关于 How can I create an External Catalog Table in Apache Flink

问题 I tried to create and ExternalCatalog to use in Apache Flink Table. I created and added to the Flink table environment (here the official documentation). For some reason, the only external table present in the 'catalog', it is not found during the scan. What I missed in the code above? val catalogName = s"externalCatalog$fileNumber" val ec: ExternalCatalog = getExternalCatalog(catalogName, 1, tableEnv) tableEnv.registerExternalCatalog(catalogName, ec) val s1: Table = tableEnv.scan("S_EXT")

Duplicate files copied in APK reference.conf

阅读更多关于 Duplicate files copied in APK reference.conf

问题 I want to use my Android App as a “Producing client” for Kafka. After adding following dependecies: // https://mvnrepository.com/artifact/org.apache.flink/flink-java compile group: 'org.apache.flink', name: 'flink-java', version: '1.1.3' // https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java_2.10 compile group: 'org.apache.flink', name: 'flink-streaming-java_2.10', version: '1.1.3' // https://mvnrepository.com/artifact/org.apache.flink/flink-clients_2.10 compile group:

Read data from Redis to Flink

阅读更多关于 Read data from Redis to Flink

问题 I have been trying to find a connector to read data from Redis to Flink. Flink's documentation contains the description for a connector to write to Redis. I need to read data from Redis in my Flink job. In Using Apache Flink for data streaming, Fabian has mentioned that it is possible to read data from Redis. What is the connector that can be used for the purpose? 回答1: We are running one in production that looks roughly like this class RedisSource extends RichSourceFunction[SomeDataType] {

Apache Flink - enable join ordering

阅读更多关于 Apache Flink - enable join ordering

问题 I have noticed that Apache Flink does not optimise the order in which the tables are joined. At the moment, it keeps the user-specified join order (basically, it takes the the query literally). I suppose that Apache Calcite can optimise the order of joins but for some reason these rules are not in use in Apache Flink. If, for example, we have two tables ' R ' and ' S ' private val tableEnv: BatchTableEnvironment = TableEnvironment.getTableEnvironment(env) private val fileNumber = 1 tableEnv

Enriching DataStream using static DataSet in Flink streaming

阅读更多关于 Enriching DataStream using static DataSet in Flink streaming

问题 I am writing a Flink streaming program in which I need to enrich a DataStream of user events using some static data set (information base, IB). For E.g. Let's say we have a static data set of buyers and we have an incoming clickstream of events, for each event we want to add a boolean flag indicating whether the doer of the event is a buyer or not. An ideal way to achieve this would be to partition the incoming stream by user id, have the buyers set available in a DataSet partitioned again by

What's the difference between a watermark and a trigger in Flink?

阅读更多关于 What's the difference between a watermark and a trigger in Flink?

问题 I read that, "..The ordering operator has to buffer all elements it receives. Then, when it receives a watermark it can sort all elements that have a timestamp that is lower than the watermark and emit them in the sorted order. This is correct because the watermark signals that not more elements can arrive that would be intermixed with the sorted elements..." - https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams Hence, it seems that the watermark serves as a signal to

Consume GCS files based on pattern from Flink

阅读更多关于 Consume GCS files based on pattern from Flink

问题 Since Flink supports the Hadoop FileSystem abstraction, and there's a GCS connector - library that implements it on top of Google Cloud Storage. How do I create a Flink file source using the code in this repo? 回答1: To achieve this you need to: Install and configure GCS connector on your Flink cluster. Add Hadoop and Flink dependencies (including HDFS connector) to your project: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-scala_2.11</artifactId> <version>${flink.version}

Flink exactly-once message processing

阅读更多关于 Flink exactly-once message processing

问题 I've setup a Flink 1.2 standalone cluster with 2 JobManagers and 3 TaskManagers and I'm using JMeter to load-test it by producing Kafka messages / events which are then processed. The processing job runs on a TaskManager and it usually takes ~15K events/s. The job has set EXACTLY_ONCE checkpointing and is persisting state and checkpoints to Amazon S3. If I shutdown the TaskManager running the job it takes a bit, a few seconds, then the job is resumed on a different TaskManager. The job mainly

Apache Flink integration with Elasticsearch

阅读更多关于 Apache Flink integration with Elasticsearch

问题 I am trying to integrate Flink with Elasticsearch 2.1.1, I am using the maven dependency <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch2_2.10</artifactId> <version>1.1-SNAPSHOT</version> </dependency> and here's the Java Code where I am reading the events from a Kafka queue (which works fine) but somehow the events are not getting posted in the Elasticsearch and there is no error either, in the below code if I change any of the settings related to

flink calculate median on stream

阅读更多关于 flink calculate median on stream

问题 I'm required to calculate median of many parameters received from a kafka stream for 15 min time window. i couldn't find any built in function for that, but I have found a way using custom WindowFunction. my questions are: is it a difficult task for flink? the data can be very large. if the data gets to giga bytes, will flink store everything in memory until the end of the time window? (one of the arguments of apply WindowFunction implementation is Iterable - a collection of all data which