Python: Writing to a single file with queue while using multiprocessing Pool

前端 未结 3 891
执念已碎
执念已碎 2020-12-04 16:28

I have hundreds of thousands of text files that I want to parse in various ways. I want to save the output to a single file without synchronization problems. I have been u

3条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-04 16:44

    Multiprocessing pools implement a queue for you. Just use a pool method that returns the worker return value to the caller. imap works well:

    import multiprocessing 
    import re
    
    def mp_worker(filename):
        with open(filename) as f:
            text = f.read()
        m = re.findall("x+", text)
        count = len(max(m, key=len))
        return filename, count
    
    def mp_handler():
        p = multiprocessing.Pool(32)
        with open('infilenamess.txt') as f:
            filenames = [line for line in (l.strip() for l in f) if line]
        with open('results.txt', 'w') as f:
            for result in p.imap(mp_worker, filenames):
                # (filename, count) tuples from worker
                f.write('%s: %d\n' % result)
    
    if __name__=='__main__':
        mp_handler()
    

提交回复
热议问题