Python Pool Multiprocessing with functions

巧了我就是萌 提交于 2019-12-12 02:31:22

问题


Okay I've been playing with some code partly to get a better understanding of python, and partly to scrape some data from the web. Part of what I want to learn about if using Python Multiprocessing and Pool.

I've got the basics working, however because I wrote the procedure single threaded first, and then moved to use pool to multi-thread the process, I have both global variables, and calls to globally defined functions. I'm guessing both of these are both bad, but searching the web, things seem to get very complicated very fast or don't answer my questions.

Can anybody confirm firstly that global variables are bad, and could lead to problems, to me this makes sense because two threads could access the same variable at the same time, hence problems.

Secondly, if I have a globally defined function, that for the sake of argument processes a string and returns it, using standard string functions, is it okay to call this from within the pool process?


回答1:


Multithreading and multiprocessing are quite different when it comes to how your variables and functions can be accessed. Separate processes (multiprocessing) have different memory spaces and therefore simply cannot access the same (instances of) functions or variables, so the concept of global variables doesn't really exist. Sharing data between processes has to be done via pipes or queues that can pass data for you. Both the main process and the child process can have access to the same queue though, so in a way you could think of that as a type of global variable.

With multithreading you can definitely access global variables and it can be a good way to program if your program is simple. For example, a child thread may read the value of a variable in the main thread and use it as a flag in the child thread's function. You need to be aware of threadsafe operations however; like you say complex operations by multiple threads on the same object can result in conflicts. In this case you need to use thread locking or some other safe method. However many operations are naturally atomic and therefore threadsafe, for instance reading a single variable. There's a good list of threadsafe operations and thread syncing on this page.

Generally with multiprocessing and multithreading you have some time consuming function that you pass to the thread or the process, but they won't be rerunning the same instance of that function. The below example shows a valid use case for multiple threads atomically accessing a global variable. The separate processes however won't be able to.

import multiprocessing as mp
import threading
import time

work_flag = True

def worker_func():
    global work_flag
    while True:
        if work_flag:
            # do stuff
            time.sleep(1)
            print mp.current_process().name, 'working, work_flag =', work_flag
        else:
            time.sleep(0.1)

def main():
    global work_flag

    # processes can't access the same "instance" of work_flag!
    process = mp.Process(target = worker_func)
    process.daemon = True
    process.start()

    # threads can safely read global work_flag
    thread = threading.Thread(target = worker_func)
    thread.daemon = True
    thread.start()

    while True:
        time.sleep(3)
        # changing this flag will stop the thread, but not the process
        work_flag = False

if __name__ == '__main__':
    main()


来源:https://stackoverflow.com/questions/29317177/python-pool-multiprocessing-with-functions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!