Why is the parallel execution of an Apache Flink application slower than the sequential execution?

问题

I have an Apache Flink setup with one TaskManager and two processing slots. When I execute an application with parallelism set as 1, the job takes around 33 seconds to execute. When I increase the parallelism to 2, the job takes 45 seconds to complete.

I am using Flink on my Windows machine with the configuration of 10 Compute Cores(4C + 6G). I want to achieve better results with 2 slots. What can I do?

回答1:

Distributed systems like Apache Flink are designed to run in data centers on hundreds of machines. They are not designed to parallelize computations on a single computer. Moreover, Flink targets large-scale problems. Jobs that run in seconds on a local machine are not the primary use case for Flink.

Parallelizing an application always causes overhead. Data has to be distributed and shared between processes and threads. Flink distributes data across TaskManager slots by serializing and deserializing it. Moreover, starting and coordinating distributed tasks also does not come for free.

It is not surprising to observe longer execution times when scaling a small-scale problem with a distributed system on a single machine. You could port the application to a thread-parallel application that leverages shared memory.

来源：https://stackoverflow.com/questions/48986523/why-is-the-parallel-execution-of-an-apache-flink-application-slower-than-the-seq

标签

apache-flink

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!