multiprocessing.pool.map and function with two arguments

后端 未结 5 1777
臣服心动
臣服心动 2020-12-15 10:54

I am using multiprocessing.Pool()

here is what i want to Pool:

def insert_and_process(file_to_process,db):
    db = DAL(\"path_to_mysql\         


        
相关标签:
5条回答
  • 2020-12-15 11:08

    The Pool documentation does not say of a way of passing more than one parameter to the target function - I've tried just passing a sequence, but does not get unfolded (one item of the sequence for each parameter).

    However, you can write your target function to expect the first (and only) parameter to be a tuple, in which each element is one of the parameters you are expecting:

    from itertools import repeat
    
    def insert_and_process((file_to_process,db)):
        db = DAL("path_to_mysql" + db)
        #Table Definations
        db.table.insert(**parse_file(file_to_process))
        return True
    
    if __name__=="__main__":
        file_list=os.listdir(".")
        P = Pool(processes=4)
        P.map(insert_and_process,zip(file_list,repeat(db))) 
    

    (note the extra parentheses in the definition of insert_and_process - python treat that as a single parameter that should be a 2-item sequence. The first element of the sequence is attributed to the first variable, and the other to the second)

    0 讨论(0)
  • 2020-12-15 11:11

    Your pool will spawn four processes, each run by it's own instance of the Python interpreter. You can use a global variable to hold your database connection object, so that exactly one connection is created per process:

    global_db = None
    
    def insert_and_process(file_to_process, db):
        global global_db
        if global_db is None:
            # If this is the first time this function is called within this
            # process, create a new connection.  Otherwise, the global variable
            # already holds a connection established by a former call.
            global_db = DAL("path_to_mysql" + db)
        global_db.table.insert(**parse_file(file_to_process))
        return True
    

    Since Pool.map() and friends only support one-argument worker functions, you need to create a wrapper that forwards the work:

    def insert_and_process_helper(args):
        return insert_and_process(*args)
    
    if __name__ == "__main__":
        file_list=os.listdir(".")
        db = "wherever you get your db"
        # Create argument tuples for each function call:
        jobs = [(file, db) for file in file_list]
        P = Pool(processes=4)
        P.map(insert_and_process_helper, jobs)
    
    0 讨论(0)
  • 2020-12-15 11:12

    No need to use zip. If for example you have 2 parameters, x and y, and each of them can get several values, like:

    X=range(1,6)
    Y=range(10)
    

    The function should get only one parameter, and unpack it inside:

    def func(params):
        (x,y)=params
        ...
    

    And you call it like that:

    params = [(x,y) for x in X for y in Y]
    pool.map(func, params)
    
    0 讨论(0)
  • 2020-12-15 11:16

    You can use

    from functools import partial 
    

    library for this purpose

    like

    func = partial(rdc, lat, lng)
    r = pool.map(func, range(8))
    

    and

    def rdc(lat,lng,x):
        pass 
    
    0 讨论(0)
  • 2020-12-15 11:28

    Using

    params=[(x,y) for x in X for y in Y]
    

    you create a full copy of x and y, and that may be slower than using

    from itertools import repeat
    P.map(insert_and_process,zip(file_list,repeat(db)))
    
    0 讨论(0)
提交回复
热议问题