Difference in behavior between os.fork and multiprocessing.Process

问题

I have this code :

import os

pid = os.fork()

if pid == 0:
    os.environ['HOME'] = "rep1"
    external_function()
else:
    os.environ['HOME'] = "rep2"
    external_function()

and this code :

from multiprocessing import Process, Pipe

def f(conn):
    os.environ['HOME'] = "rep1"
    external_function()
    conn.send(some_data)
    conn.close()

if __name__ == '__main__':
    os.environ['HOME'] = "rep2"
    external_function()
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print parent_conn.recv()
    p.join()

The external_function initializes an external programs by creating the necessary sub-directories in the directory found in the environment variable HOME. This function does this work only once in each process.

With the first example, which uses os.fork(), the directories are created as expected. But with second example, which uses multiprocessing, only the directories in rep2 get created.

Why isn't the second example creating directories in both rep1 and rep2?

回答1:

The answer you are looking for is in detail addressed here. There is also an explaination of differences of different OS.

On big issue is that the fork system call does not exist on windows. Therefore, when running a windows OS you cannot use this method. multiprocessing is a higher-level interface to execute a part of the currently running program. Therefore, it - as forking does - creates a copy of your process current state. So to say, it takes care for you about the forking of your program.

Therefore, if available you could consider fork() a lower-level interface to forking a program and the multiprocess` library to be higher-level interface to forking.

Hope this helps.

回答2:

To answer your question directly, there must be some side effect of external_process that makes it so that when the code is run in series, you get different results than if you run them at the same time. This is due to how you set up your code, and the lack of differences between os.fork and multiprocessing.Process in systems that os.fork is supported.

The only real difference between the os.fork and multiprocessing.Process is portability and library overhead, since os.fork is not supported in windows, and the multiprocessing framework is included to make multiprocessing.Process work. This is because os.fork is called by multiprocessing.Process, as this answer backs up.

The important distinction, then, is os.fork copies everything in the current process using Unix's forking, which means at the time of forking both processes are the same with PID differences. In Window's, this is emulated by rerunning all the setup code before the if __name__ == '__main__':, which is roughly the same as creating a subprocess using the subprocess library.

For you, the code snippets you provide are doing fairly different things above, because you call external_function in main before you open the new process in the second code clip, making the two processes run in series but in different processes. Also the pipe is unnecessary, as it emulates no functionality from the first code.

In Unix, the code snippets:

import os

pid = os.fork()

if pid == 0:
    os.environ['HOME'] = "rep1"
    external_function()
else:
    os.environ['HOME'] = "rep2"
    external_function()

and:

import os
from multiprocessing import Process

def f():
    os.environ['HOME'] = "rep1"
    external_function()

if __name__ == '__main__':
    p = Process(target=f)
    p.start()
    os.environ['HOME'] = "rep2"
    external_function()
    p.join()

should do exactly the same thing, but with a little extra overhead from the included multiprocessing library.

Without further information, we can't figure out what the issue is. If you can provide code that demonstrates the issue, that would help us help you.

来源：https://stackoverflow.com/questions/24041935/difference-in-behavior-between-os-fork-and-multiprocessing-process

标签

python

process

multiprocessing

fork

python-multiprocessing