Multiprocess multiple files in a list

亡梦爱人 提交于 2019-11-27 21:56:39

问题


I am trying to read a list that contains N number of .csv files stored in a list synchronously.

Right now I do the following:

import multiprocess

  1. Empty list
  2. Append list with listdir of .csv's
  3. def A() -- even files (list[::2])
  4. def B() -- odd files (list[1::2]
  5. Process 1 def A()
  6. Process 2 def B()

    def read_all_lead_files(folder):
    
        for files in glob.glob(folder+"*.csv"):
            file_list.append(files)
            def read_even():
               file_list[::2]    
            def read_odd():
               file_list[1::2]  
    
         p1 = Process(target=read_even)
         p1.start()
         p2 = Process(target=read_odd)
         p2.start()
    

Is there a faster way to split up the partitioning of the list to Process function?


回答1:


I'm guessing here at your request, because the original question is quite unclear. Since os.listdir doesn't guarantee an ordering, I'm assuming your "two" functions are actually identical and you just need to perform the same process on multiple files simultaneously.

The easiest way to do this, in my experience, is to spin up a Pool, launch a process for each file, and then wait. e.g.

import multiprocessing

def process(file):
    pass # do stuff to a file

p = multiprocessing.Pool()
for f in glob.glob(folder+"*.csv"):
    # launch a process for each file (ish).
    # The result will be approximately one process per CPU core available.
    p.apply_async(process, [f]) 

p.close()
p.join() # Wait for all child processes to close.


来源:https://stackoverflow.com/questions/23794207/multiprocess-multiple-files-in-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!