python-multiprocessing

Sharing many queues among processes in Python

天大地大妈咪最大 提交于 2019-12-17 19:16:27
问题 I am aware of multiprocessing.Manager() and how it can be used to create shared objects, in particular queues which can be shared between workers. There is this question, this question, this question and even one of my own questions. However, I need to define a great many queues, each of which is linking a specific pair of processes. Say that each pair of processes and its linking queue is identified by the variable key . I want to use a dictionary to access my queues when I need to put and

_multiprocessing.SemLock is not implemented when running on AWS Lambda

瘦欲@ 提交于 2019-12-17 17:57:10
问题 I have a short code that uses the multiprocessing package and works fine on my local machine. When I uploaded to AWS Lambda and run there, I got the following error (stacktrace trimmed): [Errno 38] Function not implemented: OSError Traceback (most recent call last): File "/var/task/recorder.py", line 41, in record pool = multiprocessing.Pool(10) File "/usr/lib64/python2.7/multiprocessing/__init__.py", line 232, in Pool return Pool(processes, initializer, initargs, maxtasksperchild) File "/usr

Multiprocessing AsyncResult.get() hangs in Python 3.7.2 but not in 3.6

青春壹個敷衍的年華 提交于 2019-12-17 17:06:38
问题 I'm trying to port some code from Python 3.6 to Python 3.7 on Windows 10. I see the multiprocessing code hang when calling .get() on the AsyncResult object. The code in question is much more complicated, but I've boiled it down to something similar to the following program. import multiprocessing def main(num_jobs): num_processes = max(multiprocessing.cpu_count() - 1, 1) pool = multiprocessing.Pool(num_processes) func_args = [] results = [] try: for num in range(num_jobs): args = (1, 2, 3)

What's the difference between ThreadPool vs Pool in Python multiprocessing module

霸气de小男生 提交于 2019-12-17 10:15:40
问题 Whats the difference between ThreadPool and Pool in multiprocessing module. When I try my code out, this is the main difference I see: from multiprocessing import Pool import os, time print("hi outside of main()") def hello(x): print("inside hello()") print("Proccess id: ", os.getpid()) time.sleep(3) return x*x if __name__ == "__main__": p = Pool(5) pool_output = p.map(hello, range(3)) print(pool_output) I see the following output: hi outside of main() hi outside of main() hi outside of main(

Process.join() and queue don't work with large numbers [duplicate]

↘锁芯ラ 提交于 2019-12-17 09:35:25
问题 This question already has an answer here : Script using multiprocessing module does not terminate (1 answer) Closed 4 years ago . I am trying to split for loop i.e. N = 1000000 for i in xrange(N): #do something using multiprocessing.Process and it works well for small values of N. Problem arise when I use bigger values of N. Something strange happens before or during p.join() and program doesn't respond. If I put print i, instead of q.put(i) in the definition of the function f everything

multiprocessing.pool.MaybeEncodingError: 'TypeError(“cannot serialize '_io.BufferedReader' object”,)'

筅森魡賤 提交于 2019-12-17 07:54:28
问题 Why does the code below work only with multiprocessing.dummy , but not with simple multiprocessing . import urllib.request #from multiprocessing.dummy import Pool #this works from multiprocessing import Pool urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com'] if __name__ == '__main__': with Pool(5) as p: results = p.map(urllib.request.urlopen, urls) Error : Traceback (most recent call last): File "urlthreads.py", line 31, in <module>

multiprocessing.pool.MaybeEncodingError: 'TypeError(“cannot serialize '_io.BufferedReader' object”,)'

吃可爱长大的小学妹 提交于 2019-12-17 07:54:07
问题 Why does the code below work only with multiprocessing.dummy , but not with simple multiprocessing . import urllib.request #from multiprocessing.dummy import Pool #this works from multiprocessing import Pool urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com'] if __name__ == '__main__': with Pool(5) as p: results = p.map(urllib.request.urlopen, urls) Error : Traceback (most recent call last): File "urlthreads.py", line 31, in <module>

Preserve custom attributes when pickling subclass of numpy array

吃可爱长大的小学妹 提交于 2019-12-17 07:42:32
问题 I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided. I'm manipulating instances of this class within a parallel loop, using Python multiprocessing . As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle . The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about

How to run independent transformations in parallel using PySpark?

邮差的信 提交于 2019-12-16 20:06:42
问题 I am trying to run 2 functions doing completely independent transformations on a single RDD in parallel using PySpark. What are some methods to do the same? def doXTransforms(sampleRDD): (X transforms) def doYTransforms(sampleRDD): (Y Transforms) if __name__ == "__main__": sc = SparkContext(appName="parallelTransforms") sqlContext = SQLContext(sc) hive_context = HiveContext(sc) rows_rdd = hive_context.sql("select * from tables.X_table") p1 = Process(target=doXTransforms , args=(rows_rdd,)) p1

order of subprocesses execution and it's impact on operations atomicity

允我心安 提交于 2019-12-14 04:00:16
问题 I'm learning python multiprocessing module and I've found this example (this is a bit modified version): #!/bin/env python import multiprocessing as mp import random import string import time # Define an output queue output = mp.Queue() # define a example function def rand_string(length, output): time.sleep(1) """ Generates a random string of numbers, lower- and uppercase chars. """ rand_str = ''.join(random.choice( string.ascii_lowercase + string.ascii_uppercase + string.digits) for i in