apache-flink

AsyncFunction - Bug collecting throwable in unorder mode

末鹿安然 提交于 2019-12-11 04:59:37
问题 I am experiencing an infinite loop using an AsyncFunc in unordered mode. It can be reproduced using the following code import org.apache.flink.streaming.api.datastream.AsyncDataStream; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.async.AsyncFunction; import org.junit.Test; import java.util.Arrays; import java.util.Collections; import java.util

Flink : Rowtime attributes must not be in the input rows of a regular join

孤人 提交于 2019-12-11 04:48:04
问题 Using flink SQL API, I want to join multiple tables together and do some computation over time window. I have 3 table coming from CSV files, and one coming from Kafka. In the Kafka table, I have a field timestampMs , that I want to use for my time window operations. For that I did the following code : reamExecutionEnvironment env = ... ; StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); TableSource table1 = CsvTableSource.builder() .path("path/to/file1.csv")

The concept groupwin is like the unaligned windows?

守給你的承諾、 提交于 2019-12-11 04:36:30
问题 groupwin I use the meaning in esper: This view groups events into sub-views by the value returned by the specified expression or the combination of values returned by a list of expressions. I think it is that you have the ability to operate by group,not stream(the group by is used to control how aggregations are grouped.) unaligned window In google's dataflow ,unaligned windows means: By unaligned windows, we mean windows which do not span the entirety of a data source, but instead only a

EOFException related to memory segments during run of Beam pipeline on Flink

爱⌒轻易说出口 提交于 2019-12-11 03:06:57
问题 I'm trying to run an Apache Beam pipeline on Flink on our test cluster. It has been failing with an EOFException at org.apache.flink.runtime.io.disk.SimpleCollectingOutputView:79 during the encoding of an object through serialisation. I haven't been able to reproduce the error locally, yet. You can find the entire job log here. Some values have been replaced with fake data. The command used to run the pipeline: bin/flink run \ -m yarn-cluster \ --yarncontainer 1 \ --yarnslots 4 \ -

Apache Flink: How can I read a DataStream/DataSet from Cassandra?

浪尽此生 提交于 2019-12-11 00:47:16
问题 I tried to treat Cassandra as the source of data in Flink with the information provided in the following links: Read data from Cassandra for processing in Flink https://www.javatips.net/api/flink-master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/async/AsyncIOExample.java I got the AsyncWaitOperator exception when I run the task. According the the first link, this exception occurs due to network problem. However, the strange thing is that I am

logback not working in Flink

纵饮孤独 提交于 2019-12-10 23:17:30
问题 I have a single node Flink instance which has the required jars for logback in the lib folder (logback-classic.jar, logback-core.jar, log4j-over-slf4j.jar). I have removed the jars for log4j from the lib folder (log4j-1.2.17.jar, slf4j-log4j12-1.7.7.jar). 'logback.xml' is also correctly updated in 'conf' folder. I have also included 'logback.xml' in the classpath, although this does not seem to be considered while the job is run. Flink refers to logback.xml inside the conf folder only. I have

Flink Job suddenly crashed with error: Encountered error while consuming partitions

落花浮王杯 提交于 2019-12-10 22:50:08
问题 I have a streaming job failed after running for 1day and 10 hours. One of the subtasks suddenly failed and crashed the whole job. Since I set up a restart_strategy, the job automatically restarted but crashed again with the same error. I found the Task Manager's log that the failed task was on, but it is not very helpful for me to debug this. Can anyone suggest a better way? Thank you. Job manager log around the failure: 2019-05-09 19:50:59,230 INFO org.apache.flink.runtime.checkpoint

Flink with Ceph as the persistent storage

我的未来我决定 提交于 2019-12-10 17:52:17
问题 Flink documents suggests that Ceph can be used as a persistent storage for states. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html Considering that Ceph is a transactional database, wouldn't it have adverse effect on Flink's performance? 回答1: Ceph describes itself as a "unified, distributed storage system" and provides a network file system API. As such, it such should be seamlessly working with Flink's state backends that persist checkpoints to a

Extracting weights from FlinkML Multiple Linear Regression

核能气质少年 提交于 2019-12-10 17:42:59
问题 I am running the example multiple linear regression for Flink (0.10-SNAPSHOT). I can't figure out how to extract the weights (e.g. slope and intercept, beta0-beta1, what ever you want to call them). I'm not super seasoned in Scala, that is probably half my problem. Thanks for any help any one can give. object Job { def main(args: Array[String]) { // set up the execution environment val env = ExecutionEnvironment.getExecutionEnvironment val survival = env.readCsvFile[(String, String, String,

Apache Flink Dynamically setting JVM_OPT env.java.opts

半腔热情 提交于 2019-12-10 17:20:07
问题 Is it possible to set the custom JVM Options env.java.opts when submitting a job without specifying it in the conf/flink-conf.yaml file? The reason I am asking is I want to use some custom variables in my log4j. I am also running my job on YARN. I have tried the following command using the CLI and it strips everything off from the = sign onwards $ flink run -m yarn-cluster -yn 2 -yst -yD env.java.opts="-DappName=myapp -DcId=mycId" 回答1: At the moment this is not possible due to the way Flink