I\'m having the following problem in python.
I need to do some calculations in parallel whose results I need to be written sequentially in a file. So I created a fu
You really should use two queues and three separate kinds of processing.
Put stuff into Queue #1.
Get stuff out of Queue #1 and do calculations, putting stuff in Queue #2. You can have many of these, since they get from one queue and put into another queue safely.
Get stuff out of Queue #2 and write it to a file. You must have exactly 1 of these and no more. It "owns" the file, guarantees atomic access, and absolutely assures that the file is written cleanly and consistently.
If anyone is looking for a simple way to do the same, this can help you. I don't think there are any disadvantages to doing it in this way. If there are, please let me know.
import multiprocessing
import re
def mp_worker(item):
# Do something
return item, count
def mp_handler():
cpus = multiprocessing.cpu_count()
p = multiprocessing.Pool(cpus)
# The below 2 lines populate the list. This listX will later be accessed parallely. This can be replaced as long as listX is passed on to the next step.
with open('ExampleFile.txt') as f:
listX = [line for line in (l.strip() for l in f) if line]
with open('results.txt', 'w') as f:
for result in p.imap(mp_worker, listX):
# (item, count) tuples from worker
f.write('%s: %d\n' % result)
if __name__=='__main__':
mp_handler()
Source: Python: Writing to a single file with queue while using multiprocessing Pool
There is a mistake in the write worker code, if the block is false, the worker will never get any data. Should be as follows:
par, res = queue.get(block = True)
You can check it by adding line
print "QSize",queueOut.qsize()
after the
queueOut.put((par,res))
With block=False you would be getting ever increasing length of the queue until it fills up, unlike with block=True where you get always "1".