apache-flink

Task not serializable Flink

浪子不回头ぞ 提交于 2020-01-14 10:39:33
问题 I am trying to do the pagerank Basic example in flink with little bit of modification(only in reading the input file, everything else is the same) i am getting the error as Task not serializable and below is the part of the output error atorg.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:179) at org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:171) Below is my code object hpdb { def main(args: Array[String]) { val env =

What does it mean that “broadcast state” unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library?

烂漫一生 提交于 2020-01-14 06:00:28
问题 From the Flink 1.5 release announcement, we know Flink now supports "broadcast state", and it was described that "broadcast state unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library.". Does it means currently we can use "broadcast state" to implement the “dynamic patterns” without Flink CEP ? Also I have no idea what's the difference when implementing the “dynamic patterns” for Flink CEP with or without broadcast state? I would appreciate If someone can give

Integration - Apache Flink + Spring Boot

强颜欢笑 提交于 2020-01-13 04:30:49
问题 I'm testing the integration between Apache Flink and Spring Boot, to run them on IDE is fine, but when I tried to run on Apache Flink Cluster I had one Exception related to ClassLoader. The classes are really simple: BootFlinkApplication @SpringBootApplication @ComponentScan("com.example.demo") public class BootFlinkApplication { public static void main(String[] args) { System.out.println("some test"); SpringApplication.run(BootFlinkApplication.class, args); } } FlinkTest @Service public

how to configure flink to understand the Azure Data Lake file system?

喜你入骨 提交于 2020-01-07 04:07:47
问题 I am using flink to read the data from Azure data lake.But flink is not able to find the Azure data lake file system. how to configure flink to understand the Azure Data lake file system.Could anyone guide me in this? 回答1: Flink has the capability to connect to any Hadoop compatible file system (i.e that implements org.apache.hadoop.fs.FileSystem). See here for the explanation: https://ci.apache.org/projects/flink/flink-docs-release-0.8/example_connectors.html In the core-site.xml, you should

Generate CSV files with headers

时间秒杀一切 提交于 2020-01-06 08:45:08
问题 I am generating CSV outputs from tuples. I can't find a way of generating headers in the files. Could anyone confirm whether this is possible or not? 回答1: it might be possible, but I haven't found it. I added a flatmap function upstream that (a) converted the output POJO to a string, and (b) did a one-shot injection of the header row. But there are concerns with state recovery, to ensure it always and only gets written out once. 来源: https://stackoverflow.com/questions/54463386/generate-csv

Understanding flink savepoints & checkpoints

↘锁芯ラ 提交于 2020-01-06 08:05:43
问题 Considering an Apache Flink streaming-application with a pipeline like this: Kafka-Source -> flatMap 1 -> flatMap 2 -> flatMap 3 -> Kafka-Sink where every flatMap function is a non-stateful operator (e.g. the normal .flatMap function of a Datastream ). How do checkpoints/savepoints work, in case an incoming message will be pending at flatMap 3 ? Will the message be reprocessed after restart beginning from flatMap 1 or will it skip to flatMap 3 ? I am a bit confused, because the documentation

Flink 1.3 running a single job on YARN how to set the number of Task Slots per TaskManager

☆樱花仙子☆ 提交于 2020-01-06 07:19:30
问题 I am running a single flink job on Yarn as descriped here. flink run -m yarn-cluster -yn 3 -ytm 12000 I can set the number of yarn nodes / task managers with the above parameter -yn . However I want to know whether it is possible to set the number of task slots per task manager . When I use the parallelsim ( -p ) parameter it only sets the overall parallelism. And the number of task slots is computed by dividing this value by the number of provided task managers. I tried using the dynamic

Json data cannot be fetched in flink when reading data from postgresql

血红的双手。 提交于 2020-01-06 06:05:55
问题 I was trying to fetch data from postgre using flink. The following is the code: dbData =env.createInput(JDBCInputFormat.buildJDBCInputFormat() .setDrivername(Utils.properties_fetch("drivername")) .setDBUrl(Utils.properties_fetch("dbURL")) .setUsername(Utils.properties_fetch("username")) .setPassword(Utils.properties_fetch("password")) .setQuery(sourcequery) .setRowTypeInfo(newRowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO,BasicTypeInfo.STRING_TYPE_INFO,

Flink: does state access across stream?

冷暖自知 提交于 2020-01-06 06:05:23
问题 I have one stream is going to store the state, and I hope another stream can retrieve the state. Is this possible? I have tried in my unit test that seems like doesn't work. 回答1: It is currently not possible that different streams share state. Even different operators which belong to the same stream are not able to share state. The only thing you could play with is to use static fields to share state across different threads and thus also streams. But this only works if different tasks are

Flink: does state access across stream?

牧云@^-^@ 提交于 2020-01-06 06:03:00
问题 I have one stream is going to store the state, and I hope another stream can retrieve the state. Is this possible? I have tried in my unit test that seems like doesn't work. 回答1: It is currently not possible that different streams share state. Even different operators which belong to the same stream are not able to share state. The only thing you could play with is to use static fields to share state across different threads and thus also streams. But this only works if different tasks are