问题
I'm trying to speed up my code using ThreadPool.
The input is a dictionary with about 80000 elements each one of which is a list of 25 elements. I have to produce an output list for each element in the dictionary, by processing and combining the elements in each list.
All the lists can be analyzed independently, so this setting should be easily parallelizable.
Here's the basic setting I used for pool.map:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(NUM_THREADS)
output = pool.map(thread_compute, iterable_list)
pool.close()
pool.join
Approach 1 (wrong): define the dictionary as global and have each thread get a key of the dictionary as input.
# in the main global input_dict iterable_list = input_dict.keys() # thread_compute function def thread_compute(key_i): list_to_combine = input_dict[key_i] # call a function processed_list = do_stuff_function(list_to_combine) return processed_list
I soon realized approach 1 will not work because of the global variable input_dict, even though it's never accessed for write operations (and so should be thread safe), is protected by the GIL (link 1, link 2) - a globally enforced lock when trying to safely access Python objects from within seperate threads.
Approach 2 (not working): I created a list containing the same elements as
input_dictwhere each entry is a list of 25 items. This list is not a global variable and each thread should be able to access an element (a 25 item list) without any overhead due to theGIL.# in the main items = list(input_dict.items()) iterable_list = [items[i][1] for i in range(len(items))] # making sure the content is correct assert(len(iterable_list) == len(input_dict)) for i in xrange(len(input_dict.keys())): assert(iterable_list[i] == input_dict[input_dict.keys()[i]]) # thread_compute function def thread_compute(list_of_25_i): # call a function processed_list = do_stuff_function(list_of_25_i) return processed_list
Here are the execution times for 1, 2, 4 and 16 threads:
1: Done, (t=36.6810s).
2: Done, (t=46.5550s).
4: Done, (t=48.2722s).
16: Done, (t=48.9660s).
Why does adding threads cause such an increase in time? I am definitely sure that this problem can benefit from multithreading, and think that the overhead of adding threads cannot be solely responsible of the increase.
回答1:
If your do_stuff_function is CPU-bound, then running it in multiple thread will not help, because the GIL only allows 1 thread to be executed at a time.
The way around this in Python is to use multiple process, just replace
from multiprocessing.dummy import Pool
with
from multiprocessing import Pool
回答2:
Try using Pool (docs) as it uses processes instead of threads.
Threads in Python are
concurrentas opposed toparallelwhich basically means that the interpreter will switch very fast between executing several threads and while executing one, lock the interpreter for all other threads, giving the impression of running operations parallel.Processes on the other hand are much more costly to spawn in terms of time, as they are basically own instances of the interpreter and the data they operate on must be serialized, and sent from the main to the worker process. There it's deserialized, computed, results are serialized and it is sent back again.
However, the time it takes to spawn processes might have a negative impact on the performance overall, considering the moderate processing times you posted.
来源:https://stackoverflow.com/questions/39728974/is-my-python-multithreading-code-affected-by-the-global-interpreter-lock