问题
Okay I've been playing with some code partly to get a better understanding of python, and partly to scrape some data from the web. Part of what I want to learn about if using Python Multiprocessing and Pool.
I've got the basics working, however because I wrote the procedure single threaded first, and then moved to use pool to multi-thread the process, I have both global variables, and calls to globally defined functions. I'm guessing both of these are both bad, but searching the web, things seem to get very complicated very fast or don't answer my questions.
Can anybody confirm firstly that global variables are bad, and could lead to problems, to me this makes sense because two threads could access the same variable at the same time, hence problems.
Secondly, if I have a globally defined function, that for the sake of argument processes a string and returns it, using standard string functions, is it okay to call this from within the pool process?
回答1:
Multithreading and multiprocessing are quite different when it comes to how your variables and functions can be accessed. Separate processes (multiprocessing) have different memory spaces and therefore simply cannot access the same (instances of) functions or variables, so the concept of global variables doesn't really exist. Sharing data between processes has to be done via pipes or queues that can pass data for you. Both the main process and the child process can have access to the same queue though, so in a way you could think of that as a type of global variable.
With multithreading you can definitely access global variables and it can be a good way to program if your program is simple. For example, a child thread may read the value of a variable in the main thread and use it as a flag in the child thread's function. You need to be aware of threadsafe operations however; like you say complex operations by multiple threads on the same object can result in conflicts. In this case you need to use thread locking or some other safe method. However many operations are naturally atomic and therefore threadsafe, for instance reading a single variable. There's a good list of threadsafe operations and thread syncing on this page.
Generally with multiprocessing and multithreading you have some time consuming function that you pass to the thread or the process, but they won't be rerunning the same instance of that function. The below example shows a valid use case for multiple threads atomically accessing a global variable. The separate processes however won't be able to.
import multiprocessing as mp
import threading
import time
work_flag = True
def worker_func():
global work_flag
while True:
if work_flag:
# do stuff
time.sleep(1)
print mp.current_process().name, 'working, work_flag =', work_flag
else:
time.sleep(0.1)
def main():
global work_flag
# processes can't access the same "instance" of work_flag!
process = mp.Process(target = worker_func)
process.daemon = True
process.start()
# threads can safely read global work_flag
thread = threading.Thread(target = worker_func)
thread.daemon = True
thread.start()
while True:
time.sleep(3)
# changing this flag will stop the thread, but not the process
work_flag = False
if __name__ == '__main__':
main()
来源:https://stackoverflow.com/questions/29317177/python-pool-multiprocessing-with-functions