apache-flink

Is it possible to deserialize Avro message(consuming message from Kafka) without giving Reader schema in ConfluentRegistryAvroDeserializationSchema

心已入冬 提交于 2020-04-16 05:40:12
问题 I am using Kafka Connector in Apache Flink for access to streams served by Confluent Kafka . Apart from schema registry url ConfluentRegistryAvroDeserializationSchema.forGeneric(...) expecting 'reader' schema. Instead of providing read schema I want to use same writer's schema(lookup in registry) for reading the message too because Consumer will not have latest schema. FlinkKafkaConsumer010<GenericRecord> myConsumer = new FlinkKafkaConsumer010<>("topic-name",

How much overhead is usual while distributing processing?

做~自己de王妃 提交于 2020-04-11 05:00:11
问题 For impatient readers: this is a work in progress, where I ask for help, during the process. Please do not judge the tools by my temporary data, as they can change while I try to get better results. We are in the middle of the decision process on the architecture for a tool to analyse the output from co-simulations. As part of that process I was asked to write a benchmark tool, and get data on the speeds of several distributed processing frameworks. The frameworks I tested are: Apache Spark,

Gracefully shut down Flink Kafka Comsumer at run time

霸气de小男生 提交于 2020-03-27 07:22:17
问题 I am using FlinkKafkaConsumer010 with Flink 1.2.0, and the problem I am facing is that: Is there a way that I can shut down the entire pipeline programmatically if some scenario is seen? On possible solution is that I can shut down the kafka consumer source by calling the close() method defined inside of FlinkKafkaConsumer010, then the pipeline with shut down as well. For this approach, I create a list that contains the references to all FlinkKafkaConsumer010 instance that I created at the

Flink job .UnfulfillableSlotRequestException: Could not fulfill slot req. Req resource profile (ResourceProfile{UNKNOWN}) is unfulfillable

不羁岁月 提交于 2020-03-26 03:51:56
问题 Flink job submission $ ./bin/flink run -m 10.0.2.4:6123 /streaming/mvn-flinkstreaming-scala/mvn-flinkstreaming-scala-1.0.jar Stream processing!!!!!!!!!!!!!!!!! org.apache.flink.streaming.api.datastream.DataStreamSink@40ef3420 ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: No pooled slot available and request to ResourceManager for new slot failed at java

Flink job .UnfulfillableSlotRequestException: Could not fulfill slot req. Req resource profile (ResourceProfile{UNKNOWN}) is unfulfillable

时光毁灭记忆、已成空白 提交于 2020-03-26 03:50:53
问题 Flink job submission $ ./bin/flink run -m 10.0.2.4:6123 /streaming/mvn-flinkstreaming-scala/mvn-flinkstreaming-scala-1.0.jar Stream processing!!!!!!!!!!!!!!!!! org.apache.flink.streaming.api.datastream.DataStreamSink@40ef3420 ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: No pooled slot available and request to ResourceManager for new slot failed at java

Flink job error java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph

我只是一个虾纸丫 提交于 2020-03-25 18:38:32
问题 Flink job submission # ./bin/flink run -m 10.0.2.4:6123 /storage/flink-1.10.0/examples/streaming/WordCount.jar --input /storage/flink-1.10.0/test.txt --output /storage/flink-1.10.0/test01.txt ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException:

flink cluster startup error [ERROR] Could not get JVM parameters properly

。_饼干妹妹 提交于 2020-03-25 16:01:26
问题 $ bin/start-cluster.sh Starting cluster. [INFO] 1 instance(s) of standalonesession are already running on centos1. Starting standalonesession daemon on host centos1. [ERROR] Could not get JVM parameters properly. [ERROR] Could not get JVM parameters properly. I have got the $JAVA_HOME in all the master and slaves ]$ echo $JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/ Below are the config file settings. jobmanager.rpc.address: 10.0.2.4 # The RPC port where the

Apache Flink - Dataset api - Side outputs

被刻印的时光 ゝ 提交于 2020-03-25 03:16:50
问题 Does Flink supports Side Outputs feature in Dataset(Batch Api) ? If not, how to handle valid and invalid records when loading from file ? 回答1: You can always do something like this: DataSet<EventOrInvalidRecord> goodAndBadTogether = input.map(new CreateObjectIfPossible()) goodAndBadTogether.filter(new KeepOnlyGood())... goodAndBadTogether.filter(new KeepOnlyBad())... Another reasonable option in some cases is to go ahead and use the DataStream API, even if you don't have streaming sources. 来源

How does Flink decide when to take a checkpoint?

扶醉桌前 提交于 2020-03-23 08:22:13
问题 I'd like to understand what determines when checkpoints are taken. How does this relate to the checkpointing interval? 回答1: To a first approximation, the Checkpoint Coordinator (part of the Job Manager) uses the checkpoint interval to determine when to start a new checkpoint. This interval is passed when you enable checkpointing, e.g., here is it set wait for 10 seconds between checkpoints: env.enableCheckpointing(10000L); or it can also be set via execution.checkpointing.interval . However,

Using a cassandra database query as the source for a Flink program

馋奶兔 提交于 2020-03-05 04:56:10
问题 I have a Cassandra database that have to receive its data in my Flink program from socket like steam for Streamprocessing. So, I wrote a simple client program that read data from Cassandra and sent the data to the socket;also,I wrote the Flink program in server base.In fact, my client program is simple and does not use any Flink instructions;it just send a Cassandra row in string format to socket and Server must receive the row. First, I run the Flink program to listen to the client and then