apache-flink | 易学教程

Task not serializable Flink

阅读更多关于 Task not serializable Flink

问题 I am trying to do the pagerank Basic example in flink with little bit of modification(only in reading the input file, everything else is the same) i am getting the error as Task not serializable and below is the part of the output error atorg.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:179) at org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:171) Below is my code object hpdb { def main(args: Array[String]) { val env =

What does it mean that “broadcast state” unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library?

阅读更多关于 What does it mean that “broadcast state” unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library?

问题 From the Flink 1.5 release announcement, we know Flink now supports "broadcast state", and it was described that "broadcast state unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library.". Does it means currently we can use "broadcast state" to implement the “dynamic patterns” without Flink CEP ? Also I have no idea what's the difference when implementing the “dynamic patterns” for Flink CEP with or without broadcast state? I would appreciate If someone can give

Integration - Apache Flink + Spring Boot

阅读更多关于 Integration - Apache Flink + Spring Boot

问题 I'm testing the integration between Apache Flink and Spring Boot, to run them on IDE is fine, but when I tried to run on Apache Flink Cluster I had one Exception related to ClassLoader. The classes are really simple: BootFlinkApplication @SpringBootApplication @ComponentScan("com.example.demo") public class BootFlinkApplication { public static void main(String[] args) { System.out.println("some test"); SpringApplication.run(BootFlinkApplication.class, args); } } FlinkTest @Service public

how to configure flink to understand the Azure Data Lake file system?

阅读更多关于 how to configure flink to understand the Azure Data Lake file system?

问题 I am using flink to read the data from Azure data lake.But flink is not able to find the Azure data lake file system. how to configure flink to understand the Azure Data lake file system.Could anyone guide me in this? 回答1: Flink has the capability to connect to any Hadoop compatible file system (i.e that implements org.apache.hadoop.fs.FileSystem). See here for the explanation: https://ci.apache.org/projects/flink/flink-docs-release-0.8/example_connectors.html In the core-site.xml, you should

Generate CSV files with headers

阅读更多关于 Generate CSV files with headers

问题 I am generating CSV outputs from tuples. I can't find a way of generating headers in the files. Could anyone confirm whether this is possible or not? 回答1: it might be possible, but I haven't found it. I added a flatmap function upstream that (a) converted the output POJO to a string, and (b) did a one-shot injection of the header row. But there are concerns with state recovery, to ensure it always and only gets written out once. 来源： https://stackoverflow.com/questions/54463386/generate-csv

Understanding flink savepoints & checkpoints

阅读更多关于 Understanding flink savepoints & checkpoints

问题 Considering an Apache Flink streaming-application with a pipeline like this: Kafka-Source -> flatMap 1 -> flatMap 2 -> flatMap 3 -> Kafka-Sink where every flatMap function is a non-stateful operator (e.g. the normal .flatMap function of a Datastream ). How do checkpoints/savepoints work, in case an incoming message will be pending at flatMap 3 ? Will the message be reprocessed after restart beginning from flatMap 1 or will it skip to flatMap 3 ? I am a bit confused, because the documentation

Flink 1.3 running a single job on YARN how to set the number of Task Slots per TaskManager

阅读更多关于 Flink 1.3 running a single job on YARN how to set the number of Task Slots per TaskManager

问题 I am running a single flink job on Yarn as descriped here. flink run -m yarn-cluster -yn 3 -ytm 12000 I can set the number of yarn nodes / task managers with the above parameter -yn . However I want to know whether it is possible to set the number of task slots per task manager . When I use the parallelsim ( -p ) parameter it only sets the overall parallelism. And the number of task slots is computed by dividing this value by the number of provided task managers. I tried using the dynamic

Json data cannot be fetched in flink when reading data from postgresql

阅读更多关于 Json data cannot be fetched in flink when reading data from postgresql

问题 I was trying to fetch data from postgre using flink. The following is the code: dbData =env.createInput(JDBCInputFormat.buildJDBCInputFormat() .setDrivername(Utils.properties_fetch("drivername")) .setDBUrl(Utils.properties_fetch("dbURL")) .setUsername(Utils.properties_fetch("username")) .setPassword(Utils.properties_fetch("password")) .setQuery(sourcequery) .setRowTypeInfo(newRowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO,BasicTypeInfo.STRING_TYPE_INFO,

Flink: does state access across stream?

阅读更多关于 Flink: does state access across stream?

问题 I have one stream is going to store the state, and I hope another stream can retrieve the state. Is this possible? I have tried in my unit test that seems like doesn't work. 回答1: It is currently not possible that different streams share state. Even different operators which belong to the same stream are not able to share state. The only thing you could play with is to use static fields to share state across different threads and thus also streams. But this only works if different tasks are

Flink: does state access across stream?

阅读更多关于 Flink: does state access across stream?