Why does joblib.Parallel() take much more time than a non-paralleled computation? Shouldn't Parallel() run faster than a non-paralleled computation?

风流意气都作罢 提交于 2019-12-06 07:38:29

Q : Shouldn't Parallel() become faster than a non-paralleled computation?

Well, that depends, depends a lot on circumstances ( be it a joblib.Parallel() or other way ).

There are no benefits that would ever come for free ( All such promises failed to deliver, since 1917 ... )

Plus,
it is very easy to happen to
pay way more ( on spawning processes for starting a multiprocessing )
than you receive back ( speedup expected over an original workflow ) ... so a due care is a must


The best first step:

Revisit the Amdahl's law revision and criticism about process-scheduling effects (speedup achieved form reorganisation of process-flows and using, at least in some part, a parallel process-scheduling).

The original Amdahl's formulation was not explicit on so called add-on "costs" one has to pay for going into parallel work-flows, that are not in the budget of the original, pure-[SERIAL] flow-of-work.

1) Process-instantiations was always expensive in python, as it first has to replicate as many copies (O/S-driven RAM-allocations sized for n_jobs(2)-copies + O/S-driven copying the RAM-image of the main python session) ( Thread-based multiprocessing does negative speedup, as there still remains GIL-lock re-[SERIAL]-isation of work-steps among all spawned threads, so you get nothing, while you have paid immense add-on costs for spawning + for each add-on GIL-ackquire/GIL-release step-dancing step - an awful antipattern for compute-intensive tasks, it may help mask some cases of I/O-related latencies, but definitely not a case for computing intensive workloads )

2) Add-on costs for parameters' transfer - you have to move some data from main process towards the new ones. It costs add-on time and you have to pay this add-on cost, that is not present in the original, pure-[SERIAL] workflow.

3) Add-on costs for results return transfer - you have to move some data from the new ones back to the originating (main) process. It costs add-on time and you have to pay this add-on cost, that is not present in the original, pure-[SERIAL] workflow.

4) Add-on costs for any data interchange ( better avoid any tempting to use this in parallel workflows - why? a) It blocks + b) It is expensive and you have to pay even more add-on costs for getting any further, which you do not pay in a pure-[SERIAL] original workflow ).


Q : Why does joblib.Parallel() take much more time than non-paralleled computation?

Simply, because you have to pay way, way more to launch the whole orchestrated circus, than you will receive back from such parallel work-flow organisation ( too small amount of work in math.sqrt( <int> ) to ever justify the relative-immense costs of spawning 2-full-copies of the original python-(main)-session + all the orchestration of dances to send just each and every ( <int> )-from-(main)-there and retrieving a returning each resulting ( <float> )-from-(joblib.Parallel()-process)-back-to-(main).

Your raw benchmarking times provide sufficient comparison of the accumulated costs to do the same result:

[SERIAL]-<iterator> feeding a [SERIAL]-processing storing into list[]:  0.51 [s]
[SERIAL]-<iterator> feeding [PARALLEL]-processing storing into list[]: 31.39 [s]

Raw estimate says about 30.9 second were "wasted" to do the same (small) amount of work just by forgetting about the add-on costs one has always to pay.


So, how to measure How Much You Have To Pay ... before one has to pay it...?

Benchmark, benchmark, benchmark the actual code ... (prototype)

If interested in benchmarking these costs - how long does it take in [us] ( i.e. How Much You Have To Pay, before any useful work even starts ) to do 1), 2) or 3), there were posted benchmarking templates to test and validate these principal costs on one's own platform, before being able to decide, what is a minimum work-package, that can justify these un-avoidable expenses and yield a "positive" speedup any greater, ( best a lot greater ) >> 1.0000 when compared to the pure-[SERIAL] original.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!