I have a code structure that looks like this:
Class A:
def __init__(self):
processes = []
for i in range(1000):
p = Process(target=self.RunProces
There are a couple of syntax issues that I can see in your code:
args in Process expects a tuple, you pass an integer, please change line 5 to:
p = Process(target=self.RunProcess, args=(i,))
list.append is a method and arguments passed to it should be enclosed in (), not [], please change line 6 to:
processes.append(p)
As @qarma points out, its not good practice to start the processes in the class constructor. I would structure the code as follows (adapting your example):
import multiprocessing as mp
from time import sleep
class A(object):
def __init__(self, *args, **kwargs):
# do other stuff
pass
def do_something(self, i):
sleep(0.2)
print('%s * %s = %s' % (i, i, i*i))
def run(self):
processes = []
for i in range(1000):
p = mp.Process(target=self.do_something, args=(i,))
processes.append(p)
[x.start() for x in processes]
if __name__ == '__main__':
a = A()
a.run()
A practical work-around is to break down your class, e.g. like this:
class A:
def __init__(self, ...):
pass
def compute(self):
procs = [Process(self.run, ...) for ... in ...]
[p.start() for p in procs]
[p.join() for p in procs]
def run(self, ...):
pass
pool = A(...)
pool.compute()
When you fork a process inside __init__, the class instance self may not be fully initialised anyway, thus it's odd to ask a subprocess to execute self.run, although technically, yes, it's possible.
If it's not that, then it sounds like an instance of this issue:
http://bugs.python.org/issue11240
It should simplify things for you to use a Pool. As far as speed, starting up the processes does take time. However, using a Pool as opposed to running njobs of Process should be as fast as you can get it to run with processes. The default setting for a Pool (as used below) is to use the maximum number of processes available (i.e. the number of CPUs you have), and keep farming out new jobs to a worker as soon as a job completes. You won't get njobs-way parallel, but you'll get as much parallelism that your CPUs can handle without oversubscribing your processors. I'm using pathos, which has a fork of multiprocessing because it's a bit more robust than standard multiprocessing… and, well, I'm also the author. But you could probably use multiprocessing for this.
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class A(object):
... def __init__(self, njobs=1000):
... self.map = Pool().map
... self.njobs = njobs
... self.start()
... def start(self):
... self.result = self.map(self.RunProcess, range(self.njobs))
... return self.result
... def RunProcess(self, i):
... return i*i
...
>>> myA = A()
>>> myA.result[:11]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> myA.njobs = 3
>>> myA.start()
[0, 1, 4]
It's a bit of an odd design to start the Pool inside of __init__. But if you want to do that, you have to get results from something like self.result… and you can use self.start for subsequent calls.
Get pathos here: https://github.com/uqfoundation