apache-flink

how to run first example of Apache Flink

半腔热情 提交于 2020-05-17 06:31:52
问题 I am trying to run the first example from the oreilly book "Stream Processing with Apache Flink" and from the flink project. Each gives different errors Example from the book gies NoClassDefFound error Example from flink project gives java.net.ConnectException: Connection refused (Connection refused) but does create a flink job, see screenshot. Detail below Book example java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError:scala/runtime/java8/JFunction1$mcVI$sp at io.github

how to run first example of Apache Flink

牧云@^-^@ 提交于 2020-05-17 06:30:05
问题 I am trying to run the first example from the oreilly book "Stream Processing with Apache Flink" and from the flink project. Each gives different errors Example from the book gies NoClassDefFound error Example from flink project gives java.net.ConnectException: Connection refused (Connection refused) but does create a flink job, see screenshot. Detail below Book example java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError:scala/runtime/java8/JFunction1$mcVI$sp at io.github

Cannot launch flink from local host when trying to run it with webUI

天大地大妈咪最大 提交于 2020-05-17 05:53:12
问题 I'm trying to debug my flink from intellij using the flink UI. the problem it somethims doesn't launched throwing java.net.BindException: Could not start rest endpoint on any port in port range 8081 my piece of code that should let the flink ui run (from windows) is: String osName = System.getProperty("os.name"); if (osName.toLowerCase().contains("win")) { Configuration conf = new Configuration(); conf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true); env = StreamExecutionEnvironment

Flink - Why should I create my own RichSinkFunction instead of just open and close my PostgreSql connection?

不羁岁月 提交于 2020-05-14 11:25:48
问题 I would like to know why I really need to create my own RichSinkFunction or use JDBCOutputFormat to connect on the database instead of just Create my connection, perform the query and close the connection using the traditional PostgreSQL drivers inside my SinkFunction? I found many articles telling do to that but does not explain why? What is the difference? Code example using JDBCOutputFormat, JDBCOutputFormat jdbcOutput = JDBCOutputFormat.buildJDBCOutputFormat() .setDrivername("org

Flink - Why should I create my own RichSinkFunction instead of just open and close my PostgreSql connection?

半腔热情 提交于 2020-05-14 11:25:31
问题 I would like to know why I really need to create my own RichSinkFunction or use JDBCOutputFormat to connect on the database instead of just Create my connection, perform the query and close the connection using the traditional PostgreSQL drivers inside my SinkFunction? I found many articles telling do to that but does not explain why? What is the difference? Code example using JDBCOutputFormat, JDBCOutputFormat jdbcOutput = JDBCOutputFormat.buildJDBCOutputFormat() .setDrivername("org

flink - how to use state as cache

◇◆丶佛笑我妖孽 提交于 2020-04-30 15:13:31
问题 I want to read history from state. if state is null, then read hbase and update the state and using onTimer to set state ttl. The problem is how to batch read hbase, because read single record from hbase is not efficient. 回答1: In general, if you want to cache/mirror state from an external database in Flink, the most performant approach is to stream the database mutations into Flink -- in other words, turn Flink into a replication endpoint for the database's change data capture (CDC) stream,

flink - how to use state as cache

旧巷老猫 提交于 2020-04-30 15:13:27
问题 I want to read history from state. if state is null, then read hbase and update the state and using onTimer to set state ttl. The problem is how to batch read hbase, because read single record from hbase is not efficient. 回答1: In general, if you want to cache/mirror state from an external database in Flink, the most performant approach is to stream the database mutations into Flink -- in other words, turn Flink into a replication endpoint for the database's change data capture (CDC) stream,

“Stream Processing with Apache Flink” how to run book code from IntelliJ?

醉酒当歌 提交于 2020-04-18 05:48:58
问题 As described in this post I have been unable to successfully run any code from the book "Stream Processing with Apache Flink, including the precompiled jar. It is not my practice to use an IDE but I thought I would try to use IntelliJ as Chapter 3 "Run and Debug Flink Applications in an IDE" describes how to do that specifically for the code for this book. The book describes a project import process that I have not found a way to use. It describes setting options on import, for example select

Can I have multiple subtasks of an operator in the same slot, in Flink?

瘦欲@ 提交于 2020-04-17 14:24:07
问题 I have been exploring Apache Flink for a few days, and I have some doubts about the concept of Task Slot. Although several questions have been asked about it, there is a point I don't get. I am using a toy application for testing, running a local cluster. I have disabled operator chaining I know from docs that slots allow for memory isolation and not CPU isolation. Reading the docs, it seems that a Task Slot is a Java thread. 1) When I deploy my application with parallelism=1, all the

Testing Flink with embedded Kafka

百般思念 提交于 2020-04-16 05:45:10
问题 I have a simple Flink application, which sums up the events with the same id and timestamp within the last minute: DataStream<String> input = env .addSource(consumerProps) .uid("app"); DataStream<Event> events = input.map(record -> mapper.readValue(record, Event.class)); pixels .assignTimestampsAndWatermarks(new TimestampsAndWatermarks()) .keyBy("id") .timeWindow(Time.minutes(1)) .sum("constant") .addSink(simpleNotificationServiceSink); env.execute(jobName); private static class