OpenMP performance

前端 未结 3 847
伪装坚强ぢ
伪装坚强ぢ 2020-12-13 14:33

Firstly, I know this [type of] question is frequently asked, so let me preface this by saying I\'ve read as much as I can, and I still don\'t know what the deal is.

3条回答
  •  执笔经年
    2020-12-13 15:08

    Since the threads actually don’t interact, you could just change the code to multiprocessing. You would only have message passing in the end and it would be guaranteed that the threads don’t need to synchronize anything.

    Here’s python3.2-code which basically does that (you’ll likely want to not do it in python for performance reasons - or put the for-loop into a C-function and bind that via cython. You’ll see from the code why I show it in Python anyway):

    from concurrent import futures
    from my_cython_module import huge_function
    parameters = range(ntest)
    with futures.ProcessPoolExecutor(4) as e:
        results = e.map(huge_function, parameters)
        shared_array = list(results)
    

    That’s it. Increase the number of processes to the number of jobs you can put into the cluster and let each process just submit and monitor a job to scale to any number of calls.

    Huge functions without interaction and small input values almost call out for multiprocessing. And as soon as you have that, switching up to MPI (with almost unlimited scaling) is not too hard.

    From the technical side, AFAIK context switches in Linux are quite expensive (monolithic kernel with much kernel-space memory), while they are much cheaper on OSX or the Hurd (Mach microkernel). That might explain the huge amount of system time you see on Linux but not on OSX.

提交回复
热议问题