apache-flink

“No Metrics ” in Flink webUI

被刻印的时光 ゝ 提交于 2019-12-11 14:26:07
问题 I started a local flink server (./bin/start-cluster.sh), and submit a job. I have the following code to define an metrics. .map(new RichMapFunction<String, String>() { private transient Counter counter; @Override public void open(Configuration config) { this.counter = getRuntimeContext() .getMetricGroup() .counter("myCounter"); } @Override public String map(String value) throws Exception { this.counter.inc(); return value; } }) but when I run the job and send some data, I cannot see any

Mapping values returns nothing in scala Flink

杀马特。学长 韩版系。学妹 提交于 2019-12-11 14:10:09
问题 I am developing a discretization algorithm in flink, but I am having problems applying a map function. The discretization is stored in V which is a private[this] val V = Vector.tabulate(nAttrs)(i => IntervalHeap(nBins, i, s)) This Vector is updated in the following method: private[this] def updateSamples(v: LabeledVector): Vector[IntervalHeap] = { val attrs = v.vector.map(_._2) // TODO: Check for missing values attrs .zipWithIndex .foreach { case (attr, i) => if (V(i).nInstances < s) { V(i)

flink org.apache.flink.table.api.NoMatchingTableFactoryException

…衆ロ難τιáo~ 提交于 2019-12-11 14:05:24
问题 I'm using flink table api, using kafka as input source and json as table schema. I got this error when submit my program: ``` The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) at org.apache.flink

Apache Flink: Performance issue when running many jobs

不问归期 提交于 2019-12-11 13:45:04
问题 With a high number of Flink SQL queries (100 of below), the Flink command line client fails with a "JobManager did not respond within 600000 ms" on a Yarn cluster, i.e. the job is never started on the cluster. JobManager logs has nothing after the last TaskManager started except DEBUG logs with "job with ID 5cd95f89ed7a66ec44f2d19eca0592f7 not found in JobManager", indicating its likely stuck (creating the ExecutionGraph?). The same works as standalone java program locally (high CPU initially

Apache Flink: executing a program which extends the RichFlatMapFunction on the remote cluster causes error

强颜欢笑 提交于 2019-12-11 13:39:13
问题 I have the following code in Apache Flink. It works fine in the local cluster while running it on the remote cluster generates NullPointerException error in line containing the command "stack.push(recordPair);". Does any one know, what is the reason? Input dataset is the same for both local and remote cluster. public static class TC extends RichFlatMapFunction<Tuple2<Integer, Integer>, Tuple2<Integer, Integer>> { private static TreeSet<Tuple2<Integer, Integer>> treeSet_duplicate_pair ;

Elasticsearch connector works in IDE but not on local cluster

泪湿孤枕 提交于 2019-12-11 13:36:56
问题 I am trying to write a Twitter stream into an Elasticsearch 2.3 index using the provided Elasticsearch2 connector Running my job in IntelliJ works fine but when I run that jar job on a local cluster I get the following error: 05/09/2016 13:26:58 Job execution switched to status RUNNING. 05/09/2016 13:26:58 Source: Custom Source -> (Sink: Unnamed, Sink: Unnamed, Sink: Unnamed)(1/1) switched to SCHEDULED 05/09/2016 13:26:58 Source: Custom Source -> (Sink: Unnamed, Sink: Unnamed, Sink: Unnamed)

Flink: What is the best way to summarize the result from all partitions

青春壹個敷衍的年華 提交于 2019-12-11 13:16:45
问题 The datastream is partitioned and distributed to each slot for processing. Now I can get the result of each partitioned task. What is the best approach to apply some function to those result of different partitions and get a global summary result? Updated: I want to implement some data summary algorithm such as Misra-Gries in Flink. It will maintain k counters and update with data arriving. Because data may be large scalable, It's better that each partition has its own k counters and process

How to export flink task or Back Pressure related metrics to prometheus?

匆匆过客 提交于 2019-12-11 10:48:46
问题 I followed the instructions Reporter to export flink metrics to prometheus, however it seems by default it only export job-manager related metrics to prometheus, see below: Open http://localhost:9249/, I just get the following info, no task or task manager related metrics found. # HELP flink_jobmanager_Status_JVM_Memory_Mapped_MemoryUsed MemoryUsed (scope: jobmanager_Status_JVM_Memory_Mapped) # TYPE flink_jobmanager_Status_JVM_Memory_Mapped_MemoryUsed gauge flink_jobmanager_Status_JVM_Memory

Flink source for periodical update

百般思念 提交于 2019-12-11 10:26:18
问题 I'm trying to implement external config for long-running flink job. My idea is to create custom source that periodically (every 5 minutes) polls JSON-encoded config from external service by http. How to create source that perform action every N minutes? How can I rebroadcast this config to all executors? 回答1: first, you need to make an event class which will define all the attributes that your event stream has and then makes all getters, setters and other methods. An example of this class

flink: Flink Shell throws NullPointerException

徘徊边缘 提交于 2019-12-11 09:48:43
问题 I am using Flink Interactive Shell to execute WordCount. It works with a file size of 10MB. But with a 100MB file the shell throws a NullPointerException: : java.lang.NullPointerException at org.apache.flink.api.common.accumulators.SerializedListAccumulator.deserializeList(SerializedListAccumulator.java:93) at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:549) at .<init>(<console>:22) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at