flink-streaming

How does Flink scale for hot partitions?

ε祈祈猫儿з 提交于 2021-02-20 02:46:54
问题 If I have a use case where I need to join two streams or aggregate some kind of metrics from a single stream, and I use keyed streams to partition the events, how does Flink handle the operations for hot partitions where the data might not fit into memory and needs to be split across partitions? 来源: https://stackoverflow.com/questions/66273158/how-does-flink-scale-for-hot-partitions

flink program behaves differently in parallelism

こ雲淡風輕ζ 提交于 2021-02-19 08:55:07
问题 I am using Flink 1.4.1 and I am using CEP. I have to calculate lifetime order amount by the same user in each order. So when I sending orders Order A -> amount: 500, Order B -> amount: 200, Order C -> amount: 300 and calculating key by the user using states. Sometime in Order B, it's showing 700 and sometimes 200. Means sometimes it's adding order A in B, sometimes not. I am running code in 6 parallelisms. Is this parallelism issue or distributed state issue? When I run the whole program with

Flink taskmanager out of memory and memory configuration

馋奶兔 提交于 2021-02-18 17:37:09
问题 We are using Flink streaming to run a few jobs on a single cluster. Our jobs are using rocksDB to hold a state. The cluster is configured to run with a single Jobmanager and 3 Taskmanager on 3 separate VMs. Each TM is configured to run with 14GB of RAM. JM is configured to run with 1GB. We are experiencing 2 memory related issues: - When running Taskmanager with 8GB heap allocation, the TM ran out of heap memory and we got heap out of memory exception. Our solution to this problem was

Using Broadcast State To Force Window Closure Using Fake Messages

China☆狼群 提交于 2021-02-11 15:31:47
问题 Description: Currently I am working on using Flink with an IOT setup. Essentially, devices are sending data such as (device_id, device_type, event_timestamp, etc) and I don't have any control over when the messages get sent. I then key the steam by device_id and device_type to preform aggregations. I would like to use event-time given that is ensures the timers which are set trigger in a deterministic nature given a failure. However, given that this isn't always a high throughput stream a

Using Broadcast State To Force Window Closure Using Fake Messages

最后都变了- 提交于 2021-02-11 15:30:32
问题 Description: Currently I am working on using Flink with an IOT setup. Essentially, devices are sending data such as (device_id, device_type, event_timestamp, etc) and I don't have any control over when the messages get sent. I then key the steam by device_id and device_type to preform aggregations. I would like to use event-time given that is ensures the timers which are set trigger in a deterministic nature given a failure. However, given that this isn't always a high throughput stream a

Is it possible to recover after losing the checkpoint coordinator

自作多情 提交于 2021-02-11 13:23:29
问题 I'm using incremental checkpoint with RocksDB and saving the checkpoints into a remote destination(S3 in my case). What will happen if someone deletes the job manager server (where the checkpoint coordinator operates) and reinstall it? By losing the checkpoint coordinator I also lose the option to recover the state from the checkpoints? because from what I know, the coordinator holds all the references of the checkpoints. 回答1: If you run Flink with high availability enabled, then Flink will

How to unit test a Flink ProcessFunction?

眉间皱痕 提交于 2021-02-10 05:20:47
问题 I have a simple ProcessFunction that takes in String as input and gives a String as output. How do I unit test this using Junit? As the processElement method is a void method and returns no value. public class SampleProcessFunction extends ProcessFunction<String, String>{ @Override public void processElement(String content, Context context, Collector<String> collector) throws Exception { String output = content + "output"; collector.collect(output); } } 回答1: In order to unit test this method,

How to unit test a Flink ProcessFunction?

馋奶兔 提交于 2021-02-10 05:20:47
问题 I have a simple ProcessFunction that takes in String as input and gives a String as output. How do I unit test this using Junit? As the processElement method is a void method and returns no value. public class SampleProcessFunction extends ProcessFunction<String, String>{ @Override public void processElement(String content, Context context, Collector<String> collector) throws Exception { String output = content + "output"; collector.collect(output); } } 回答1: In order to unit test this method,

Flink checkpoints keeps failing

陌路散爱 提交于 2021-02-09 08:01:50
问题 we are trying to setup a Flink stateful job using RocksDB backend. We are using session window, with 30mins gap. We use aggregateFunction, so not using any Flink state variables. With sampling, we have less than 20k events/s, 20 - 30 new sessions/s. Our session basically gather all the events. the size of the session accumulator would go up along time. We are using 10G memory in total with Flink 1.9, 128 containers. Following's the settings: state.backend: rocksdb state.checkpoints.dir: hdfs:

How to properly test a Flink window function?

被刻印的时光 ゝ 提交于 2021-02-08 09:49:20
问题 Does anyone know how to test windowing functions in Flink ? I am using the dependency flink-test-utils_2.11 . My steps are: Get the StreamExecutionEnvironment Create objects and add to the invironment Do a keyBy add a Session Window execute an aggregate function public class AggregateVariantCEVTest extends AbstractTestBase { @Test public void testAggregateVariantCev() throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1