How to multiprocess for loops in python where each calculation is independent?

血红的双手。 提交于 2021-01-29 08:32:51

问题


I'm trying to learn something a little new in each mini-project I do. I've made a Game of Life( https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life ) program.

This involves a numpy array where each point in the array (a "cell") has an integer value. To evolve the state of the game, you have to compute for each cell the sum of all its neighbour values (8 neighbours).

The relevant class in my code is as follows, where evolve() takes in one of the xxx_method methods. It works fine for conv_method and loop_method, but I want to use multiprocessing (which I've identified should work, unlike multithreading?) on loop_method to see any performance increases. I feel it should work as each calculation is independent. I've tried a naive approach, but don't really understand the multiprocessing module well enough. Could I also use it within the evolve() method, as again I feel that each calculation within the double for loops are independent.

Any help appreciated, including general code comments.

Edit - I'm getting a RuntimeError, which I'm half-expecting as my understanding of multiprocessing isnt good enough. What needs to be done to the code to get it work?

class GoL:
    """ Game Engine """
    def __init__(self, size):
        self.size = size
        self.grid = Grid(size) # Grid is another class ive defined

    def evolve(self, neigbour_sum_func):
        new_grid = np.zeros_like(self.grid.cells) # start with everything dead, only need to test for keeping/turning alive
        neighbour_sum_array = neigbour_sum_func()
        for i in range(self.size):
            for j in range(self.size):
                cell_sum = neighbour_sum_array[i,j]
                if self.grid.cells[i,j]: # already alive
                    if cell_sum == 2 or cell_sum == 3:
                        new_grid[i,j] = 1
                else: # test for dead coming alive
                    if cell_sum == 3:
                        new_grid[i,j] = 1

        self.grid.cells = new_grid

    def conv_method(self):
        """ Uses 2D convolution across the entire grid to work out the neighbour sum at each cell """
        kernel = np.array([
                            [1,1,1],
                            [1,0,1],
                            [1,1,1]],
                            dtype=int)
        neighbour_sum_grid = correlate2d(self.grid.cells, kernel, mode='same')
        return neighbour_sum_grid

    def loop_method(self, partition=None):
        """ Also works out neighbour sum for each cell, using a more naive loop method """
        if partition is None:
            cells = self.grid.cells # no multithreading, just work on entire grid
        else:
            cells = partition # just work on a set section of the grid

        neighbour_sum_grid = np.zeros_like(cells) # copy
        for i, row in enumerate(cells):
            for j, cell_val in enumerate(row):
                neighbours = cells[i-1:i+2, j-1:j+2]
                neighbour_sum = np.sum(neighbours) - cell_val
                neighbour_sum_grid[i,j] = neighbour_sum
        return neighbour_sum_grid

    def multi_loop_method(self):
        cores = cpu_count()
        procs = []
        slices = []
        if cores == 2: # for my VM, need to impliment generalised method for more cores
            half_grid_point = int(SQUARES / 2)
            slices.append(self.grid.cells[0:half_grid_point])
            slices.append(self.grid.cells[half_grid_point:])
        else:
            Exception

        for sl in slices:
            proc = Process(target=self.loop_method, args=(sl,))
            proc.start()
            procs.append(proc)

        for proc in procs:
            proc.join()

回答1:


I want to use multiprocessing (which I've identified should work, unlike multithreading?)

Multithreading would not work because it would run on a single processor which is your current bottleneck. Multithreading is for things where you are awaiting for an API to answer. In that meantime you can do other calculations. But in Conway's Game of Life your program is constantly running.


Getting multiprocessing right is hard. If you have 4 processors you can define a quadrant for each of your processor. But you need to share the result between your processors. And with this you are getting a performance hit. They need to be synchronized/running on the same clock speed/have the same tick rate for updating and the result needs to be shared.

Multiprocessing starts being feasible when your grid is very big/there is a ton to calculate.
Since the question is very broad and complicated I cannot give you a better answer. There is a paper on getting parallel processing on Conway's Game of Life: http://www.shodor.org/media/content/petascale/materials/UPModules/GameOfLife/Life_Module_Document_pdf.pdf



来源:https://stackoverflow.com/questions/60339686/how-to-multiprocess-for-loops-in-python-where-each-calculation-is-independent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!