apache-flink

Apache Flink vs Apache Spark as platforms for large-scale machine learning?

心已入冬 提交于 2019-12-02 15:15:08
Could anyone compare Flink and Spark as platforms for machine learning? Which is potentially better for iterative algorithms? Link to the general Flink vs Spark discussion: What is the difference between Apache Spark and Apache Flink? Fabian Hueske Disclaimer: I'm a PMC member of Apache Flink. My answer focuses on the differences of executing iterations in Flink and Spark. Apache Spark executes iterations by loop unrolling. This means that for each iteration a new set of tasks/operators is scheduled and executed. Spark does that very efficiently because it is very good at low-latency task

flink InputStream of class class org.apache.commons.compress.archivers.zip.ZipFile$1 is not implementing InputStreamStatistics

∥☆過路亽.° 提交于 2019-12-02 11:02:46
问题 I was trying to load an excel into POI workbook in a Flink program. Has an error like this. Caused by: java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipFile$1 is not implementing InputStreamStatistics. at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.(ZipArchiveThresholdInputStream.java:63) at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) at org.apache.poi.openxml4j.util

flink 1.3.1 elasticsearch 5.5.1. ElasticsearchSinkFunction fails with java.lang.NoSuchMethodError

久未见 提交于 2019-12-02 10:43:53
I'm going through following samples using Scala / sbt : flink / elasticsearch / kibana flink tutorial My built.sbt includes following versions: libraryDependencies ++= Seq( "org.apache.flink" %% "flink-scala" % "1.3.1" % "provided", "org.apache.flink" %% "flink-streaming-scala" % "1.3.1" % "provided", "org.apache.flink" %% "flink-clients" % "1.3.1" % "provided", "joda-time" % "joda-time" % "2.9.9", "com.google.guava" % "guava" % "22.0", "com.typesafe" % "config" % "1.3.0", "org.apache.flink" % "flink-connector-kafka-0.10_2.10" % "1.2.0", "org.elasticsearch" % "elasticsearch" % "5.5.1", "org

Unable to execute CEP pattern in Flink dashboard version 1.3.2 which is caused by ClassNotFoundException

泪湿孤枕 提交于 2019-12-02 10:30:52
I have written a simple pattern like this Pattern<JoinedEvent, ?> pattern = Pattern.<JoinedEvent>begin("start") .where(new SimpleCondition<JoinedEvent>() { @Override public boolean filter(JoinedEvent streamEvent) throws Exception { return streamEvent.getRRInterval()>= 10 ; } }).within(Time.milliseconds(WindowLength)); and it executes well in IntellijIdea. I am using Flink 1.3.2 both in the dashboard and in IntelliJ-Idea. While I was building Flink from source, I have seen a lot of warning messages which led me to believe that iterative condition classes have not been included in a jar as error

How to attach a HashMap to a Configuration object in Flink?

纵饮孤独 提交于 2019-12-02 09:02:45
问题 I want to share a HashMap across every node in Flink and allow the nodes to update that HashMap. I have this code so far: object ParallelStreams { val env = StreamExecutionEnvironment.getExecutionEnvironment //Is there a way to attach a HashMap to this config variable? val config = new Configuration() config.setClass("HashMap", Class[CustomGlobal]) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) class CustomGlobal extends ExecutionConfig.GlobalJobParameters { override def toMap:

Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed - Mini Cluster problem

自作多情 提交于 2019-12-02 08:41:17
Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed Hi, I am a flink newbee and in my job, I am trying to use windowing to simply aggregate elements to enable delayed processing: src = src.timeWindowAll(Time.milliseconds(1000)).process(new BaseDelayingProcessAllWindowFunctionImpl()); processwindow function simply collects input elements: public class BaseDelayingProcessAllWindowFunction<IN> extends ProcessAllWindowFunction<IN, IN, TimeWindow> { private static final long serialVersionUID = 1L; protected Logger logger; public

Is it possible to use Riak CS with Apache Flink?

守給你的承諾、 提交于 2019-12-02 08:40:46
I want to configure filesystem state backend and zookeeper recovery mode: state.backend: filesystem state.backend.fs.checkpointdir: ??? recovery.mode: zookeeper recovery.zookeeper.storageDir: ??? As you can see I should specify checkpointdir and storageDir parameters, but I don't have any file systems supported by Apache Flink (like HDFS or Amazon S3). But I have installed Riak CS cluster (seems like it compatible with S3 ). So, can I use Riak CS together with Apache Flink? If it is possible: how to configure Apache Flink to work with Riak CS? Answer: How to join Apache Flink and Riak CS? Riak

Flink latency metrics not being shown

戏子无情 提交于 2019-12-02 08:12:40
While running Flink 1.5.0 with a local environment I was trying to get latency metrics via REST (with something similar to http://localhost:8081/jobs/e779dbbed0bfb25cd02348a2317dc8f1/vertices/e70bbd798b564e0a50e10e343f1ac56b/metrics ) but there isn't any reference to latency . All of this while the latency tracking is enabled which I confirmed by checking with the debugger that the LatencyMarksEmitter is emiting the marks. What can I be doing wrong? In 1.5 latency metrics aren't exposed for tasks but for jobs instead, the reasoning being that latency metrics inherently contain information

How to concatenate two streams in Apache Flink

徘徊边缘 提交于 2019-12-02 07:45:02
问题 E.g. i want to compose stream of 1, 2, 3 and 4, 5 in single one, so result should be: 1, 2, 3, 4, 5 . In other words: if first source is exhausted - get elements from second one. My closest attempt, which unfortunately does not preserve items order , is: val a = streamEnvironment.fromElements(1, 2, 3) val b = streamEnvironment.fromElements(4, 5) val c = a.union(b) c.map(x => println(s"X=$x")) // X=4, 5, 1, 2, 3 or something like that Also did similar attempt with datetime included, but with

How to attach a HashMap to a Configuration object in Flink?

最后都变了- 提交于 2019-12-02 04:34:39
I want to share a HashMap across every node in Flink and allow the nodes to update that HashMap. I have this code so far: object ParallelStreams { val env = StreamExecutionEnvironment.getExecutionEnvironment //Is there a way to attach a HashMap to this config variable? val config = new Configuration() config.setClass("HashMap", Class[CustomGlobal]) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) class CustomGlobal extends ExecutionConfig.GlobalJobParameters { override def toMap: util.Map[String, String] = { new HashMap[String, String]() } } class MyCoMap extends RichCoMapFunction