Is it true that fork() calls clone() internally?

前端 未结 2 1431
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-07 11:21

I read here that clone() system call is used to create a thread in Linux. Now the syntax of clone() is such that a starting routine/function addres

相关标签:
2条回答
  • 2020-12-07 11:46

    @Dietrich did a great job explaining by looking at the implementation. That's amazing! Anyway, there's another way of discovering that: by looking at the calls strace "sniffs".

    We can prepare a very simple program that uses fork(2) and then check our hypothesis (i.e, that there's no fork syscall really happening).

    #define WRITE(__fd, __msg) write(__fd, __msg, strlen(__msg))
    
    int main(int argc, char *argv[])
    {
      pid_t pid;
    
      switch (pid = fork()) {
        case -1:
          perror("fork:");
          exit(EXIT_FAILURE);
          break;
        case 0:
          WRITE(STDOUT_FILENO, "Hi, i'm the child");
          exit(EXIT_SUCCESS);
        default:
          WRITE(STDERR_FILENO, "Heey, parent here!");
          exit(EXIT_SUCCESS);
      }
    
      return EXIT_SUCCESS;
    }
    

    Now, compile that code ( clang -Wall -g fork.c -o fork.out ) and then execute it with strace:

    strace -Cfo ./fork.strace.log ./fork.out
    

    This will intercept system calls called by our process (with -f we also intercept the child's calls) and then put those calls into ./fork.trace.log; -c option gives us a summary at the end). The result in my machine (Ubuntu 14.04, x86_64 Linux 3.16) is (summarized):

    6915  arch_prctl(ARCH_SET_FS, 0x7fa001a93740) = 0
    6915  mprotect(0x7fa00188c000, 16384, PROT_READ) = 0
    6915  mprotect(0x600000, 4096, PROT_READ) = 0
    6915  mprotect(0x7fa001ab9000, 4096, PROT_READ) = 0
    6915  munmap(0x7fa001a96000, 133089)    = 0
    6915  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa001a93a10) = 6916
    6915  write(2, "Heey, parent here!", 18) = 18
    6916  write(1, "Hi, i'm the child", 17 <unfinished ...>
    6915  exit_group(0)                     = ?
    6916  <... write resumed> )             = 17
    6916  exit_group(0)                     = ?
    6915  +++ exited with 0 +++
    6916  +++ exited with 0 +++
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     24.58    0.000029           4         7           mmap
     17.80    0.000021           5         4           mprotect
     14.41    0.000017           9         2           write
     11.02    0.000013          13         1           munmap
     11.02    0.000013           4         3         3 access
     10.17    0.000012           6         2           open
      2.54    0.000003           2         2           fstat
      2.54    0.000003           3         1           brk
      1.69    0.000002           2         1           read
      1.69    0.000002           1         2           close
      0.85    0.000001           1         1           clone
      0.85    0.000001           1         1           execve
      0.85    0.000001           1         1           arch_prctl
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.000118                    28         3 total
    

    As expected, no fork calls. Just the raw clone syscall with its flags, child stack and etc properly set.

    0 讨论(0)
  • 2020-12-07 11:52

    For questions like this, always read the source code.

    From glibc's nptl/sysdeps/unix/sysv/linux/fork.c (GitHub) (nptl = native Posix threads for Linux) we can find the implementation of fork(), which is definitely not a syscall, we can see that the magic happens inside the ARCH_FORK macro, which is defined as an inline call to clone() in nptl/sysdeps/unix/sysv/linux/x86_64/fork.c (GitHub). But wait, no function or stack pointer is passed to this version of clone()! So, what is going on here?

    Let's look at the implementation of clone() in glibc, then. It's in sysdeps/unix/sysv/linux/x86_64/clone.S (GitHub). You can see that what it does is it saves the function pointer on the child's stack, calls the clone syscall, and then the new process will read pop the function off the stack and then call it.

    So it works like this:

    clone(void (*fn)(void *), void *stack_pointer)
    {
        push fn onto stack_pointer
        syscall_clone()
        if (child) {
            pop fn off of stack
            fn();
            exit();
        }
    }
    

    And fork() is...

    fork()
    {
        ...
        syscall_clone();
        ...
    }
    

    Summary

    The actual clone() syscall does not take a function argument, it just continues from the return point, just like fork(). So both the clone() and fork() library functions are wrappers around the clone() syscall.

    Documentation

    My copy of the manual is somewhat more upfront about the fact that clone() is both a library function and a system call. However, I do find it somewhat misleading that clone() is found in section 2, rather than both section 2 and section 3. From the man page:

    #include <sched.h>
    
    int clone(int (*fn)(void *), void *child_stack,
              int flags, void *arg, ...
              /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
    
    /* Prototype for the raw system call */
    
    long clone(unsigned long flags, void *child_stack,
              void *ptid, void *ctid,
              struct pt_regs *regs);
    

    And,

    This page describes both the glibc clone() wrapper function and the underlying system call on which it is based. The main text describes the wrapper function; the differences for the raw system call are described toward the end of this page.

    Finally,

    The raw clone() system call corresponds more closely to fork(2) in that execution in the child continues from the point of the call. As such, the fn and arg arguments of the clone() wrapper function are omitted. Furthermore, the argument order changes.

    0 讨论(0)
提交回复
热议问题