python-multiprocessing

python multiprocessing, manager initiates process spawn loop

元气小坏坏 提交于 2019-12-04 15:27:23
I have a simple python multiprocessing script that sets up a pool of workers that attempt to append work-output to a Manager list. The script has 3 call stacks: - main calls f1 that spawns several worker processes that call another function g1. When one attempts to debug the script (incidentally on Windows 7/64 bit/VS 2010/PyTools) the script runs into a nested process creation loop, spawning an endless number of processes. Can anyone determine why? I'm sure I am missing something very simple. Here's the problematic code: - import multiprocessing import logging manager = multiprocessing

Python tornado with multi-process

岁酱吖の 提交于 2019-12-04 15:14:41
I found how to execute tornado with multi-process. server = HTTPServer(app) server.bind(8888) server.start(0) #Forks multiple sub-processes IOLoop.current().start() In this situation is there any way to share resource over processes? and It seems using the same port over processes. Does tornado balance the load itself for each process? If so, how does it do? In general, when using multi-process mode the processes only communicate via external services: databases, cache servers, message queues, etc. There are some additional options available for processes that are running on the same machine

multiprocessing.pool.MaybeEncodingError: Error sending result: Reason: 'TypeError(“cannot serialize '_io.BufferedReader' object”,)'

孤者浪人 提交于 2019-12-04 15:01:56
I get the following error: multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f758760d6a0>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)' When running this code: from operator import itemgetter from multiprocessing import Pool import wget def f(args): print(args[1]) wget.download(args[1], "tests/" + target + '/' + str(args[0]), bar=None) if __name__ == "__main__": a = Pool(2) a.map(f, list(enumerate(urls))) #urls is a list of urls. What does the error mean and how can I fix it? First couple of

Python 3 multiprocessing: optimal chunk size

拜拜、爱过 提交于 2019-12-04 13:06:22
问题 How do I find the optimal chunk size for multiprocessing.Pool instances? I used this before to create a generator of n sudoku objects: processes = multiprocessing.cpu_count() worker_pool = multiprocessing.Pool(processes) sudokus = worker_pool.imap_unordered(create_sudoku, range(n), n // processes + 1) To measure the time, I use time.time() before the snippet above, then I initialize the pool as described, then I convert the generator into a list ( list(sudokus) ) to trigger generating the

Multiprocessing: why is a numpy array shared with the child processes, while a list is copied?

烂漫一生 提交于 2019-12-04 12:52:25
I used this script (see code at the end) to assess whether a global object is shared or copied when the parent process is forked. Briefly, the script creates a global data object, and the child processes iterate over data . The script also monitors the memory usage to assess whether the object was copied in the child processes. Here are the results: data = np.ones((N,N)) . Operation in the child process: data.sum() . Result: data is shared (no copy) data = list(range(pow(10, 8))) . Operation in the child process: sum(data) . Result: data is copied . data = list(range(pow(10, 8))) . Operation

Optimizing multiprocessing.Pool with expensive initialization

自古美人都是妖i 提交于 2019-12-04 09:19:12
Here is a complete simple working example import multiprocessing as mp import time import random class Foo: def __init__(self): # some expensive set up function in the real code self.x = 2 print('initializing') def run(self, y): time.sleep(random.random() / 10.) return self.x + y def f(y): foo = Foo() return foo.run(y) def main(): pool = mp.Pool(4) for result in pool.map(f, range(10)): print(result) pool.close() pool.join() if __name__ == '__main__': main() How can I modify it so Foo is only initialized once by each worker, not every task? Basically I want the init called 4 times, not 10. I am

Parallelize this nested for loop in python

狂风中的少年 提交于 2019-12-04 07:56:14
I'm struggling again to improve the execution time of this piece of code. Since the calculations are really time-consuming I think that the best solution would be to parallelize the code. I was first working with maps as explained in this question, but then I tried a more simple approach thinking that I could find a better solution. However I couldn't come up with anything yet, so since it's a different problem I decided to post it as a new question. I am working on a Windows platform, using Python 3.4. Here's the code: similarity_matrix = [[0 for x in range(word_count)] for x in range(word

How to read serial data with multiprocessing in python?

二次信任 提交于 2019-12-04 06:48:49
问题 I have a device that outputs data at irregular intervals. I want to write data onto a csv in 2 second intervals. So I figured multiprocessing with a queue might work. Here I'm trying to just pass data from one process to another but I get Serial Exception. Also, I'm unable to run it on IDLE. So I'm stuck with using the terminal. As a result, the error message closes as soon as it opens. Here's the code: import multiprocessing import time import datetime import serial try: fio2_ser = serial

Python Multiprocessing concurrency using Manager, Pool and a shared list not working

谁说胖子不能爱 提交于 2019-12-04 06:42:23
I am learning python multiprocessing, and I am trying to use this feature to populate a list with all the files present in an os. However, the code that I wrote is executing sequentially only. #!/usr/bin/python import os import multiprocessing tld = [os.path.join("/", f) for f in os.walk("/").next()[1]] #Gets a top level directory names inside "/" manager = multiprocessing.Manager() files = manager.list() def get_files(x): for root, dir, file in os.walk(x): for name in file: files.append(os.path.join(root, name)) mp = [multiprocessing.Process(target=get_files, args=(tld[x],)) for x in range

Does pydispatcher run the handler function in a background thread?

空扰寡人 提交于 2019-12-04 06:00:42
问题 Upon looking up event handler modules, I came across pydispatcher, which seemed beginner friendly. My use case for the library is that I want to send a signal if my queue size is over a threshold. The handler function can then start processing and removing items from the queue (and subsequently do a bulk insert into the database). I would like the handler function to run in the background. I am aware that I can simply overwrite the queue.append() method checking for the queue size and calling