Why do people write #!/usr/bin/env python on the first line of a Python script?

后端 未结 21 2388
刺人心
刺人心 2020-11-21 06:16

It seems to me like the files run the same without that line.

21条回答
  •  生来不讨喜
    2020-11-21 07:17

    The exec system call of the Linux kernel understands shebangs (#!) natively

    When you do on bash:

    ./something
    

    on Linux, this calls the exec system call with the path ./something.

    This line of the kernel gets called on the file passed to exec: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25

    if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
    

    It reads the very first bytes of the file, and compares them to #!.

    If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec call with path /usr/bin/env python and current file as the first argument:

    /usr/bin/env python /path/to/script.py
    

    and this works for any scripting language that uses # as a comment character.

    And yes, you can make an infinite loop with:

    printf '#!/a\n' | sudo tee /a
    sudo chmod +x /a
    /a
    

    Bash recognizes the error:

    -bash: /a: /a: bad interpreter: Too many levels of symbolic links
    

    #! just happens to be human readable, but that is not required.

    If the file started with different bytes, then the exec system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46 (which also happens to be human readable for .ELF). Let's confirm that by reading the 4 first bytes of /bin/ls, which is an ELF executable:

    head -c 4 "$(which ls)" | hd 
    

    output:

    00000000  7f 45 4c 46                                       |.ELF|
    00000004                                                                 
    

    So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?

    Finally, you can add your own shebang handlers with the binfmt_misc mechanism. For example, you can add a custom handler for .jar files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.

    I don't think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form "if executable scripts are supported by the system something may happen". macOS and FreeBSD also seem to implement it however.

    PATH search motivation

    Likely, one big motivation for the existence of shebangs is the fact that in Linux, we often want to run commands from PATH just as:

    basename-of-command
    

    instead of:

    /full/path/to/basename-of-command
    

    But then, without the shebang mechanism, how would Linux know how to launch each type of file?

    Hardcoding the extension in commands:

     basename-of-command.py
    

    or implementing PATH search on every interpreter:

    python basename-of-command
    

    would be a possibility, but this has the major problem that everything breaks if we ever decide to refactor the command into another language.

    Shebangs solve this problem beautifully.

提交回复
热议问题