How does copy-on-write work in fork()?

问题

I want to know how copy-on-write happens in fork().

Assuming we have a process A that has a dynamical int array:

int *array = malloc(1000000*sizeof(int));

Elements in array are initialized to some meaningful values. Then, we use fork() to create a child process, namely B. B will iterate the array and do some calculations:

for(a in array){
    a = a+1;
}

I know B will not copy the entire array immediately, but when does the child B allocate memory for array? during fork()?
Does it allocate the entire array all at once, or only a single integer for a = a+1?
a = a+1; how does this happen? Does B read data from A and write new data to its own array?

I wrote some code to explore how COW works. My environment: ubuntu 14.04, gcc4.8.2

#include <stdlib.h>
#include <stdio.h>
#include <sys/sysinfo.h>

void printMemStat(){
    struct sysinfo si;
    sysinfo(&si);
    printf("===\n");
    printf("Total: %llu\n", si.totalram);
    printf("Free: %llu\n", si.freeram);
}

int main(){
    long len = 200000000;
    long *array = malloc(len*sizeof(long));
    long i = 0;
    for(; i<len; i++){
        array[i] = i;
    }

    printMemStat();
    if(fork()==0){
        /*child*/
        printMemStat();

        i = 0;
        for(; i<len/2; i++){
            array[i] = i+1;
        }

        printMemStat();

        i = 0;
        for(; i<len; i++){
            array[i] = i+1;
        }

        printMemStat();

    }else{
        /*parent*/
        int times=10;
        while(times-- > 0){
            sleep(1);
        }
    }
    return 0;
}

After fork(), the child process modifies a half of numbers in array, and then modifies the entire array. The outputs are:

===
Total: 16694571008
Free: 2129162240
===
Total: 16694571008
Free: 2126106624
===
Total: 16694571008
Free: 1325101056
===
Total: 16694571008
Free: 533794816

It seems that the array is not allocated as a whole. If I slightly change the first modification phase to:

i = 0;
for(; i<len/2; i++){
    array[i*2] = i+1;
}

The outputs will be:

===
Total: 16694571008
Free: 2129924096
===
Total: 16694571008
Free: 2126868480
===
Total: 16694571008
Free: 526987264
===
Total: 16694571008
Free: 526987264

回答1:

Depends on the Operating System, hardware architecture and libc. But yes in case of recent Linux with MMU the fork(2) will work with copy-on-write. It will only (allocate and) copy a few system structures and the page table, but the heap pages actually point to the ones of the parent until written.

More control over this can be exercised with the clone(2) call. And vfork(2) beeing a special variant which does not expect the pages to be used. This is typically used before exec().

As for the allocation: the malloc() has meta information over requested memory blocks (address and size) and the C variable is a pointer (both in process memory heap and stacks). Those two look the same for the child (same values because same underlying memory page seen in the address space of both processes). So from a C program point of view the array is already allocated and the variable initialized when the process comes into existence. The underlying memory pages are however pointing to the original physical ones of the parent process, so no extra memory pages are needed until they are modified.

If the child allocates a new array it depends if it fits into the already existing heap pages or if the brk of the process needs to be increased. In both cases only the modified pages get copied and the new pages get allocated only for the child.

This also means that the physical memory might run out after malloc(). (Which is bad as the program cannot check the error return code of "a operation in a random code line"). Some operating systems will not allow this form of overcommit: So if you fork a process it will not allocate the pages, but it requires them to be available at that moment (kind of reserves them) just in case. In Linux this is configurable and called overcommit-accounting.

回答2:

Some systems have a system call vfork(), which was originally designed as a lower-overhead version of fork(). Since fork() involved copying the entire address space of the process, and was therefore quite expensive, the vfork() function was introduced (in 3.0BSD).

However, since vfork() was introduced, the implementation of fork() has improved drastically, most notably with the introduction of 'copy-on-write', where the copying of the process address space is transparently faked by allowing both processes to refer to the same physical memory until either of them modify it. This largely removes the justification for vfork(); indeed, a large proportion of systems now lack the original functionality of vfork() completely. For compatibility, though, there may still be a vfork() call present, that simply calls fork() without attempting to emulate all of the vfork() semantics.

As a result, it is very unwise to actually make use of any of the differences between fork() and vfork(). Indeed, it is probably unwise to use vfork() at all, unless you know exactly why you want to.

The basic difference between the two is that when a new process is created with vfork(), the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve(), at which point the parent process continues.

This means that the child process of a vfork() must be careful to avoid unexpectedly modifying variables of the parent process. In particular, the child process must not return from the function containing the vfork() call, and it must not call exit() (if it needs to exit, it should use _exit(); actually, this is also true for the child of a normal fork()).

来源：https://stackoverflow.com/questions/27161412/how-does-copy-on-write-work-in-fork

标签

Linux

unix

fork

copy-on-write