I am using multiprocessing.Pool()
here is what i want to Pool:
def insert_and_process(file_to_process,db):
db = DAL(\"path_to_mysql\
The Pool
documentation does not say of a way of passing more than one parameter to the target function - I've tried just passing a sequence, but does not get unfolded (one item of the sequence for each parameter).
However, you can write your target function to expect the first (and only) parameter to be a tuple, in which each element is one of the parameters you are expecting:
from itertools import repeat
def insert_and_process((file_to_process,db)):
db = DAL("path_to_mysql" + db)
#Table Definations
db.table.insert(**parse_file(file_to_process))
return True
if __name__=="__main__":
file_list=os.listdir(".")
P = Pool(processes=4)
P.map(insert_and_process,zip(file_list,repeat(db)))
(note the extra parentheses in the definition of insert_and_process
- python treat that as a single parameter that should be a 2-item sequence. The first element of the sequence is attributed to the first variable, and the other to the second)
Your pool will spawn four processes, each run by it's own instance of the Python interpreter. You can use a global variable to hold your database connection object, so that exactly one connection is created per process:
global_db = None
def insert_and_process(file_to_process, db):
global global_db
if global_db is None:
# If this is the first time this function is called within this
# process, create a new connection. Otherwise, the global variable
# already holds a connection established by a former call.
global_db = DAL("path_to_mysql" + db)
global_db.table.insert(**parse_file(file_to_process))
return True
Since Pool.map()
and friends only support one-argument worker functions, you need to create a wrapper that forwards the work:
def insert_and_process_helper(args):
return insert_and_process(*args)
if __name__ == "__main__":
file_list=os.listdir(".")
db = "wherever you get your db"
# Create argument tuples for each function call:
jobs = [(file, db) for file in file_list]
P = Pool(processes=4)
P.map(insert_and_process_helper, jobs)
No need to use zip. If for example you have 2 parameters, x and y, and each of them can get several values, like:
X=range(1,6)
Y=range(10)
The function should get only one parameter, and unpack it inside:
def func(params):
(x,y)=params
...
And you call it like that:
params = [(x,y) for x in X for y in Y]
pool.map(func, params)
You can use
from functools import partial
library for this purpose
like
func = partial(rdc, lat, lng)
r = pool.map(func, range(8))
and
def rdc(lat,lng,x):
pass
Using
params=[(x,y) for x in X for y in Y]
you create a full copy of x
and y
, and that may be slower than using
from itertools import repeat
P.map(insert_and_process,zip(file_list,repeat(db)))