What is some example code for demonstrating multicore speedup in Python on Windows?

梦想的初衷 提交于 2019-11-26 14:33:35

问题


I'm using Python 3 on Windows and trying to construct a toy example, demonstrating how using multiple CPU cores can speed up computation. The toy example is rendering of the Mandelbrot fractal.

So far:

  • I have avoided threading, since the Global Interpreter Lock prohibits multicores in this context
  • I'm ditching example code that won't work on Windows because it lacks the forking capability of Linux
  • Trying to use the "multiprocessing" package. I declare p=Pool(8) (8 is my number of cores) and using p.starmap(..) to delegate work. This is supposed to produce multiple "subprocesses" which windows will automatically delegate to different CPUs

However, I'm unable to demonstrate any speedup, whether due to overhead or no actual multiprocessing. Pointers to toy examples with demonstrable speedup would therefore be very helpful :-)

Edit: Thank you! This pushed me in the right direction and I've now got a working example that demonstrates a doubling of speed on a CPU with 4 cores.
A copy of my code with "lecture notes" here: https://pastebin.com/c9HZ2vAV

I settled on using Pool() but will later try out the "Process" alternative that @16num pointed out. Below is a code example for Pool():

    p = Pool(cpu_count())

    #Unlike map, starmap only allows 1 input. "partial" provides a workaround
    partial_calculatePixel = partial(calculatePixel, dataarray=data) 
    koord = []
    for j in range(height):
        for k in range(width):
            koord.append((j,k))

    #Runs the calls to calculatePixel in a pool. "hmm" collects the output
    hmm = p.starmap(partial_calculatePixel,koord)

回答1:


It's very simple to demonstrate a multiprocessing speed up:

import multiprocessing
import sys
import time

# multi-platform precision clock
get_timer = time.clock if sys.platform == "win32" else time.time

def cube_function(num):
    time.sleep(0.01)  # let's simulate it takes ~10ms for the CPU core to cube the number
    return num**3

if __name__ == "__main__":  # multiprocessing guard
    # we'll test multiprocessing with pools from one to the number of CPU cores on the system
    # it won't show significant improvements after that and it will soon start going
    # downhill due to the underlying OS thread context switches
    for workers in range(1, multiprocessing.cpu_count() + 1):
        pool = multiprocessing.Pool(processes=workers)
        # lets 'warm up' our pool so it doesn't affect our measurements
        pool.map(cube_function, range(multiprocessing.cpu_count()))
        # now to the business, we'll have 10000 numbers to quart via our expensive function
        print("Cubing 10000 numbers over {} processes:".format(workers))
        timer = get_timer()  # time measuring starts now
        results = pool.map(cube_function, range(10000))  # map our range to the cube_function
        timer = get_timer() - timer  # get our delta time as soon as it finishes
        print("\tTotal: {:.2f} seconds".format(timer))
        print("\tAvg. per process: {:.2f} seconds".format(timer / workers))
        pool.close()  # lets clear out our pool for the next run
        time.sleep(1)  # waiting for a second to make sure everything is cleaned up

Of course, we're just simulating here 10ms-per-number calculations, you can replace cube_function with anything CPU taxing for a real-world demonstration. The results are as expected:

Cubing 10000 numbers over 1 processes:
        Total: 100.01 seconds
        Avg. per process: 100.01 seconds
Cubing 10000 numbers over 2 processes:
        Total: 50.02 seconds
        Avg. per process: 25.01 seconds
Cubing 10000 numbers over 3 processes:
        Total: 33.36 seconds
        Avg. per process: 11.12 seconds
Cubing 10000 numbers over 4 processes:
        Total: 25.00 seconds
        Avg. per process: 6.25 seconds
Cubing 10000 numbers over 5 processes:
        Total: 20.00 seconds
        Avg. per process: 4.00 seconds
Cubing 10000 numbers over 6 processes:
        Total: 16.68 seconds
        Avg. per process: 2.78 seconds
Cubing 10000 numbers over 7 processes:
        Total: 14.32 seconds
        Avg. per process: 2.05 seconds
Cubing 10000 numbers over 8 processes:
        Total: 12.52 seconds
        Avg. per process: 1.57 seconds

Now, why not 100% linear? Well, first of all, it takes some time to map/distribute the data to the sub-processes and to get it back, there is some cost to context switching, there are other tasks that use my CPUs from time to time, time.sleep() is not exactly precise (nor it could be on a non-RT OS)... But the results are roughly in the ballpark expected for parallel processing.



来源:https://stackoverflow.com/questions/44521931/what-is-some-example-code-for-demonstrating-multicore-speedup-in-python-on-windo

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!