python-multiprocessing

Multiprocess in python uses only one process

微笑、不失礼 提交于 2019-12-10 11:42:32
问题 I am trying to learn multiprocessing with python. I wrote a simple code that should feed each process with 1000 lines from a txt input file. My main function reads a line, splits it and then performs some very simple operations with the elements in the string. Eventually the results should be written in an output file. When I run it, 4 processes are correctly spawned, but only one process is actually running with minimal CPU. As a result the code is very slow and defies the purpose to use

In Python multiprocessing.Process , do we have to use `__name__ == __main__`?

安稳与你 提交于 2019-12-10 11:39:16
问题 I am writing a class that supports easy-to-use API to add different settings to run a given program ( class.add(args) ) and to benchmark all settings with multiprocessing ( class.benchmark(num_processes=5) ). From the documentation of multiprocessing.Process, it seems all cases using if __name__ == '__main__' . Is it safe to skip using it ? For example, the class method benchmark(num_processes=5) starts and joins processes, and another python file file.py creates a class and simply call class

Threading in Python takes longer time instead of making it faster?

青春壹個敷衍的年華 提交于 2019-12-10 11:34:19
问题 I wrote 3 different codes to compare having threads vs. not having threads. Basically measuring how much time I save by using threading and the result didn't make any sense. Here are my codes: import time def Function(): global x x = 0 while x < 300000000: x += 1 print x e1 = time.clock() E1 = time.time() Function() e2 = time.clock() E2 = time.time() print e2 - e1 print E2 - E1 When I ran this, I got this as output: 26.6358742929 26.6440000534 Then I wrote another function as shown below and

Cupy get error in multithread.pool if GPU already used

让人想犯罪 __ 提交于 2019-12-10 11:22:47
问题 I tried to use cupy in two parts of my program, one of them being parallelized with a pool. I managed to reproduce it with a simple example: import cupy import numpy as np from multiprocessing import pool def f(x): return cupy.asnumpy(2*cupy.array(x)) input = np.array([1,2,3,4]) print(cupy.asnumpy(cupy.array(input))) print(np.array(list(map(f, input)))) p = pool.Pool(4) output = p.map(f, input) p.close() p.join() print(output) The output is the following: [1 2 3 4] [2 4 6 8] Exception in

why is more than one worker used in `multiprocessing.Pool().apply_async()`?

空扰寡人 提交于 2019-12-10 10:17:12
问题 Problem From the multiprocessing.Pool docs: apply_async(func ...) : A variant of the apply() method which returns a result object. ... Reading further ... apply(func[, args[, kwds]]) : Call func with arguments args and keyword arguments kwds. It blocks until the result is ready. Given this blocks, apply_async() is better suited for performing work in parallel. Additionally, func is only executed in one of the workers of the pool. The last bold line suggests only one worker from a pool is used

Pool workers do not complete all tasks

有些话、适合烂在心里 提交于 2019-12-10 10:07:17
问题 I have a relatively simple python multiprocessing script that sets up a pool of workers that append output to a pandas dataframe by way of a custom manager. What I am finding is when I call close()/join() on the pool, not all the tasks submitted by apply_async are being completed. Here's a simplified example that submits 1000 jobs but only half complete causing an assertion error. Have I overlooked something very simple or is this perhaps a bug? from pandas import DataFrame from

Using keras.utils.Sequence multiprocessing and data base - when to connect?

有些话、适合烂在心里 提交于 2019-12-10 10:06:49
问题 I'm training a neural network with Keras with Tensorflow backend. Data set does not fit in RAM, therefore, I store it in the Mongo database and retrieve batches using subclass of keras.utils.Sequence . Everything works fine, if I run model.fit_generator() with use_multiprocessing=False . When I turn on multiprocessing, I get errors either during spawning of workers or in connection to the data base. If I create a connection in __init__ , I've got an exception whose text says something about

dask computation not executing in parallel

混江龙づ霸主 提交于 2019-12-10 03:53:59
问题 I have a directory of json files that I am trying to convert to a dask DataFrame and save it to castra. There are 200 files containing O(10**7) json records between them. The code is very simple largely following tutorial examples. import dask.dataframe as dd import dask.bag as db import json txt = db.from_filenames('part-*.json') js = txt.map(json.loads) df = js.to_dataframe() cs=df.to_castra("data.castra") I am running it on a 32 core machine, but the code only utilizes one core at 100%. My

Python Multiprocessing RuntimeError on Windows

一笑奈何 提交于 2019-12-10 02:52:13
问题 I have a class function (let's call it "alpha.py") that uses multiprocessing (processes=2) to fork a process and is part of a Python package that I wrote. In a separate Python script (let's call it "beta.py"), I instantiated an object from this class and called the corresponding function that uses multiprocessing. Finally, all of this is wrapped inside a wrapper Python script (let's call this "gamma.py") that handles many different class objects and functions. Essentially: Run ./gamma.py from

Python Multiprocessing concurrency using Manager, Pool and a shared list not working

非 Y 不嫁゛ 提交于 2019-12-09 17:38:50
问题 I am learning python multiprocessing, and I am trying to use this feature to populate a list with all the files present in an os. However, the code that I wrote is executing sequentially only. #!/usr/bin/python import os import multiprocessing tld = [os.path.join("/", f) for f in os.walk("/").next()[1]] #Gets a top level directory names inside "/" manager = multiprocessing.Manager() files = manager.list() def get_files(x): for root, dir, file in os.walk(x): for name in file: files.append(os