python-multiprocessing

Chunksize irrelevant for multiprocessing / pool.map in Python?

心已入冬 提交于 2019-12-03 21:01:44
I try to utilize the pool multiprocessing functionality of python. Independent how I set the chunk size (under Windows 7 and Ubuntu - the latter see below with 4 cores), the amount of parallel threads seems to stay the same. from multiprocessing import Pool from multiprocessing import cpu_count import multiprocessing import time def f(x): print("ready to sleep", x, multiprocessing.current_process()) time.sleep(20) print("slept with:", x, multiprocessing.current_process()) if __name__ == '__main__': processes = cpu_count() print('-' * 20) print('Utilizing %d cores' % processes) print('-' * 20)

unexpected memory footprint differences when spawning python multiprocessing pool

有些话、适合烂在心里 提交于 2019-12-03 16:44:48
问题 Trying to contribute some optimization for the parallelization in the pystruct module and in discussions trying to explain my thinking for why I wanted to instantiate pools as early in the execution as possible and keep them around as long as possible, reusing them, I realized I know that it works best to do this, but I don't completely know why. I know that the claim, on *nix systems, is that a pool worker subprocess copies on write from all the globals in the parent process. This is

Is it possible to prioritise a lock?

。_饼干妹妹 提交于 2019-12-03 16:23:10
问题 I have a multiprocessing program where one process adds elements to a shared list ( multiprocessing.Manager().list() ) several other processes consume these elements from that list (and remove them); they run until there is something to process in the list and the process above is still adding to the list. I implemented locking (via multiprocessing.Lock() ) when adding to the list, or removing from it. Since there is one "feeder" process and several (10-40) "consumer" ones all competing for

AttributeError: 'Pool' object has no attribute '__exit__'

谁说胖子不能爱 提交于 2019-12-03 13:13:25
I'm doing some multiprocessing python scripts using multiprocessing.Pool . These scripts look like the following: from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': with Pool(processes=4) as pool: # start 4 worker processes print(pool.map(f, range(10))) # prints "[0, 1, 4,..., 81]" When running this with Python 3.4, everything is fine. However, when using Python 2.6 or 3.1 I get this error: AttributeError: 'Pool' object has no attribute '__exit__' Using Python 2.7 or 3.2 , the error is essentially the same: AttributeError: __exit__ Why does this happen and how can

Python manager.dict() is very slow compared to regular dict

你离开我真会死。 提交于 2019-12-03 12:51:51
I have a dict to store objects: jobs = {} job = Job() jobs[job.name] = job now I want to convert it to use manager dict because I want to use multiprocessing and need to share this dict amonst processes mgr = multiprocessing.Manager() jobs = mgr.dict() job = Job() jobs[job.name] = job just by converting to use manager.dict() things got extremely slow. For example, if using native dict, it only took .65 seconds to create 625 objects and store it into the dict. The very same task now takes 126 seconds! Any optimization i can do to keep manager.dict() on par with python {}? The problem is that

multiprocessing.Pool in jupyter notebook works on linux but not windows

穿精又带淫゛_ 提交于 2019-12-03 11:31:36
问题 I'm trying to run a few independent computations (though reading from the same data). My code works when I run it on Ubuntu, but not on Windows (windows server 2012 R2), where I get the error: 'module' object has no attribute ... when I try to use multiprocessing.Pool (it appears in the kernel console, not as output in the notebook itself) (And I've already made the mistake of defining the function AFTER creating the pool, and I've also corrected it, that's not the problem). This happens even

python running coverage on never ending process

纵然是瞬间 提交于 2019-12-03 11:22:01
问题 I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests). The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks. Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report

Is it possible to prioritise a lock?

断了今生、忘了曾经 提交于 2019-12-03 05:50:06
I have a multiprocessing program where one process adds elements to a shared list ( multiprocessing.Manager().list() ) several other processes consume these elements from that list (and remove them); they run until there is something to process in the list and the process above is still adding to the list. I implemented locking (via multiprocessing.Lock() ) when adding to the list, or removing from it. Since there is one "feeder" process and several (10-40) "consumer" ones all competing for the lock, and that the consumer processes are fast, I end up with the "feeder" process having a hard

How to solve memory issues problems while multiprocessing using Pool.map()?

回眸只為那壹抹淺笑 提交于 2019-12-03 03:49:17
问题 I have written the program (below) to: read a huge text file as pandas dataframe then groupby using a specific column value to split the data and store as list of dataframes. then pipe the data to multiprocess Pool.map() to process each dataframe in parallel. Everything is fine, the program works well on my small test dataset. But, when I pipe in my large data (about 14 GB), the memory consumption exponentially increases and then freezes the computer or gets killed (in HPC cluster). I have

python running coverage on never ending process

与世无争的帅哥 提交于 2019-12-03 02:50:03
I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests). The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks. Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report seems to be empty, except from the sleep line. I would be happy to receive any ideas about how to get the