how to tune the parallelism hint in storm

后端 未结 3 1260
没有蜡笔的小新
没有蜡笔的小新 2020-12-28 18:07

\"parallelism hint\" is used in storm to parallelise a running storm topology. I know there are concepts like worker process, executor and tasks. Would it make sense to make

3条回答
  •  天涯浪人
    2020-12-28 18:38

    Adding to what @Chiron explained

    "parallelism hint" is used in storm to parallelise a running storm topology

    Actually in storm the term parallelism hint is used to specify the initial number of executor (threads) of a component (spout, bolt) e.g

        topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
    

    The above statement tells storm to allot 2 executor thread initially (this can be changed in the run time). Again

        topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(4) 
    

    the setNumTasks(4) indicate to run 4 associated tasks (this will be same throughout the lifetime of a topology). So in this case each storm will be running two tasks per executor. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.

    Would it make sense to make the parallelism hint as big as possible so that your topologies are parallelised as much as possible

    One key thing to note that if you intent to run more than one tasks per executor it does not increase the level of parallelism. Because executor uses one single thread to process all the tasks i.e tasks run serially on an executor.

    enter image description here

    The purpose of configuring more than 1 task per executor is it is possible to change the number of executor(thread) using the re-balancing mechanism in the runtime (remember the number of tasks are always the same through out the life cycle of a topology) while the topology is still running.

    Increasing the number of workers (responsible for running one or more executors for one or more components) might also gives you a performance benefit, but this also relative as I found from this discussion where nathanmarz says

    Having more workers might have better performance, depending on where your bottleneck is. Each worker has a single thread that passes tuples on to the 0mq connections for transfer to other workers, so if you're bottlenecked on CPU and each worker is dealing with lots of tuples, more workers will probably net you better throughput.

    So basically there is no definite answer to this, you should try different configuration based on your environment and design.

提交回复
热议问题