Writing to a file with multiprocessing

前端 未结 3 873
孤街浪徒
孤街浪徒 2020-12-09 04:34

I\'m having the following problem in python.

I need to do some calculations in parallel whose results I need to be written sequentially in a file. So I created a fu

相关标签:
3条回答
  • 2020-12-09 05:03

    You really should use two queues and three separate kinds of processing.

    1. Put stuff into Queue #1.

    2. Get stuff out of Queue #1 and do calculations, putting stuff in Queue #2. You can have many of these, since they get from one queue and put into another queue safely.

    3. Get stuff out of Queue #2 and write it to a file. You must have exactly 1 of these and no more. It "owns" the file, guarantees atomic access, and absolutely assures that the file is written cleanly and consistently.

    0 讨论(0)
  • 2020-12-09 05:23

    If anyone is looking for a simple way to do the same, this can help you. I don't think there are any disadvantages to doing it in this way. If there are, please let me know.

    import multiprocessing 
    import re
    
    def mp_worker(item):
        # Do something
        return item, count
    
    def mp_handler():
        cpus = multiprocessing.cpu_count()
        p = multiprocessing.Pool(cpus)
        # The below 2 lines populate the list. This listX will later be accessed parallely. This can be replaced as long as listX is passed on to the next step.
        with open('ExampleFile.txt') as f:
            listX = [line for line in (l.strip() for l in f) if line]
        with open('results.txt', 'w') as f:
            for result in p.imap(mp_worker, listX):
                # (item, count) tuples from worker
                f.write('%s: %d\n' % result)
    
    if __name__=='__main__':
        mp_handler()
    

    Source: Python: Writing to a single file with queue while using multiprocessing Pool

    0 讨论(0)
  • 2020-12-09 05:25

    There is a mistake in the write worker code, if the block is false, the worker will never get any data. Should be as follows:

    par, res = queue.get(block = True)
    

    You can check it by adding line

     print "QSize",queueOut.qsize()
    

    after the queueOut.put((par,res))

    With block=False you would be getting ever increasing length of the queue until it fills up, unlike with block=True where you get always "1".

    0 讨论(0)
提交回复
热议问题