问题
I need a functionality in storm that i know (based on the docs) has not been yet implemented. I need to add more tasks at runtime without the need to have an initial large number of tasks, because it might cause performance issues. because Running more than one task per executor does not increase the level of parallelism -- an executor always has one thread that it uses for all of its tasks, which means that tasks run serially on an executor.
I know that rebalance command can be used to add executors ans worker processes at runtime and there is a rule that #executors<=#tasks and this means that number of tasks should be static at runtime, but i'm curious how hard is it(if not impossible) to add this feature to storm.
Is there a way to implement this functionality in storm or it can't be done at all? if there is a way please give me clue how to do it.
回答1:
Not sure what you mean by "since those extra tasks run serially".
Tasks is Storm are use to exploit data parallelism. In theory it's possible to add code to change the number of tasks at runtime. But it would be a huge change and AFAIK there are no plans to add this feature.
Compare http://storm.apache.org/releases/1.0.3/Understanding-the-parallelism-of-a-Storm-topology.html
Because keys are assigned to tasks hash based, changing the number of tasks would require to rehash all keys to new tasks. If an operator builds up an key-based internal state, this state would need to get partitioned by key and redistributed accordingly, too.
来源:https://stackoverflow.com/questions/42487777/is-it-possible-to-add-tasks-dynamically-at-runtime-in-apache-storm-not-just-reba