python-multiprocessing

How to use boto3 client with Python multiprocessing?

五迷三道 提交于 2019-12-05 15:45:25
Code looks something like this: import multiprocessing as mp from functools import partial import boto3 import numpy as np s3 = boto3.client('s3') def _something(**kwargs): # Some mixed integer programming stuff related to the variable archive return np.array(some_variable_related_to_archive) def do(s3): archive = np.load(s3.get_object('some_key')) # Simplified -- details not relevant pool = mp.pool() sub_process = partial(_something, slack=0.1) parts = np.array_split(archive, some_int) target_parts = np.array(things) out = pool.starmap(sub_process, [x for x in zip(parts, target_parts)] #

Applying two functions to two lists simultaneously using Pool and multiprocessing

时光毁灭记忆、已成空白 提交于 2019-12-05 14:55:58
I have a (large) list with male and female agentes. I want to apply different functions to each. How can I use Pool in such a case? Given that the agents are independent of each other. An example would be: males = ['a', 'b', 'c'] females = ['d', 'e', 'f'] for m in males: func_m(m) for f in females: func_f(f) I started like that: from multiprocessing import Pool p = Pool(processes=2) p.map() # Here is the problem I would like to have something like: p.ZIP(func_f for f in females, func_m for m in males) # pseudocode This is possible to launch the computation asynchronously using map_async . This

Python multiprocessing pool stuck

不羁岁月 提交于 2019-12-05 13:12:11
I'm trying to run some sample code of the multiprocessing.pool module of python, found in the web. The code is: def square(x): return x * x if __name__ == '__main__': pool = Pool(processes=4) inputs = [0, 1, 2, 3, 4] outputs = pool.map(square, inputs) But when i try to run it, it never finsh the execution and i have to restart the kernel of my IpythonNotebook notebook. What's the problem? KT. As you may read from the answer pointed out by John in the comments, multiprocessing.Pool , in general, should not be expected to work well within an interactive interpreter. To understand why it is the

Why does memory consumption increase dramatically in `Pool.map()` multiprocessing?

一个人想着一个人 提交于 2019-12-05 12:40:16
I am doing a multiprocessing on a pandas dataframe by splitting it into several dataframes, which are stored as list. And, using Pool.map() I am passing the dataframe to a defined function. My input file is about "300 mb", so small dataframes are roughly "75 mb". But, when the multiprocessing is running the memory consumption increases by 7 GB and each local process consumes about approx. 2 GB of memory. Why is this happening? def main(): my_df = pd.read_table("my_file.txt", sep="\t") my_df = my_df.groupby('someCol') my_df_list = [] for colID, colData in my_df: my_df_list.append(colData) # now

multiprocessing on tee'd generators

心已入冬 提交于 2019-12-05 09:43:33
Consider the following script in which I test two ways of performing some calculations on generators obtained by itertools.tee : #!/usr/bin/env python3 from sys import argv from itertools import tee from multiprocessing import Process def my_generator(): for i in range(5): print(i) yield i def double(x): return 2 * x def compute_double_sum(iterable): s = sum(map(double, iterable)) print(s) def square(x): return x * x def compute_square_sum(iterable): s = sum(map(square, iterable)) print(s) g1, g2 = tee(my_generator(), 2) try: processing_type = argv[1] except IndexError: processing_type = "no

Apply reduce on generator output with multiprocessing

爱⌒轻易说出口 提交于 2019-12-05 08:53:57
I have a generator function (Python) that works like this def Mygenerator(x, y, z, ...): while True: # code that makes two matrices based on sequences of input arrays yield (matrix1, matrix2) What I want to do is to add the output from this generator. This line does the job: M1, M2 = reduce(lambda x, y: x[0] + y[0], x[1] + y[1], Mygenerator(x, y, z, ...)) I would like to parallelize this to speed up the computations. It is important that the outputs from Mygenerator is reduced as it is yielded, since list(Mygenerator(...)) would take too much memory. To answer my own question, I found a

Chunksize irrelevant for multiprocessing / pool.map in Python?

梦想与她 提交于 2019-12-05 07:48:51
问题 I try to utilize the pool multiprocessing functionality of python. Independent how I set the chunk size (under Windows 7 and Ubuntu - the latter see below with 4 cores), the amount of parallel threads seems to stay the same. from multiprocessing import Pool from multiprocessing import cpu_count import multiprocessing import time def f(x): print("ready to sleep", x, multiprocessing.current_process()) time.sleep(20) print("slept with:", x, multiprocessing.current_process()) if __name__ == '_

How to retrieve values from a function run in parallel processes?

守給你的承諾、 提交于 2019-12-05 05:26:29
The Multiprocessing module is quite confusing for python beginners specially for those who have just migrated from MATLAB and are made lazy with its parallel computing toolbox. I have the following function which takes ~80 Secs to run and I want to shorten this time by using Multiprocessing module of Python. from time import time xmax = 100000000 start = time() for x in range(xmax): y = ((x+5)**2+x-40) if y <= 0xf+1: print('Condition met at: ', y, x) end = time() tt = end-start #total time print('Each iteration took: ', tt/xmax) print('Total time: ', tt) This outputs as expected: Condition met

dask computation not executing in parallel

二次信任 提交于 2019-12-05 03:45:54
I have a directory of json files that I am trying to convert to a dask DataFrame and save it to castra. There are 200 files containing O(10**7) json records between them. The code is very simple largely following tutorial examples. import dask.dataframe as dd import dask.bag as db import json txt = db.from_filenames('part-*.json') js = txt.map(json.loads) df = js.to_dataframe() cs=df.to_castra("data.castra") I am running it on a 32 core machine, but the code only utilizes one core at 100%. My understanding from the docs is that this code execute in parallel. Why is it not? Did I misunderstand

No space left while using Multiprocessing.Array in shared memory

强颜欢笑 提交于 2019-12-05 03:34:54
I am using the multiprocessing functions of Python to run my code parallel on a machine with roughly 500GB of RAM. To share some arrays between the different workers I am creating a Array object: N = 150 ndata = 10000 sigma = 3 ddim = 3 shared_data_base = multiprocessing.Array(ctypes.c_double, ndata*N*N*ddim*sigma*sigma) shared_data = np.ctypeslib.as_array(shared_data_base.get_obj()) shared_data = shared_data.reshape(-1, N, N, ddim*sigma*sigma) This is working perfectly for sigma=1 , but for sigma=3 one of the harddrives of the device is slowly filled, until there is no free space anymore and