C/C++ function definitions without assembly

I always thought that functions like printf() are, in the last step, defined using inline assembly. That deep in the bowels of stdio.h is buried some asm code that actually tells CPU what to do. For example, in dos, I remember it was implemented by first moving the beginning of the string to some memory location or register and than calling an intterupt.

However, since the x64 version of Visual Studio doesn't support inline assembler at all, it made me wonder how there could be no assembler-defined functions at all in C/C++. How does a library function like printf() get implemented in C/C++ without using assembler code? What actually executes the right software interrupt? Thanks.

How does a library function like printf() get implemented in C/C++ without using assembler code? What actually executes the right software interrupt?

For most practical purposes, you can't really call the BIOS from Linux or from Windows. And really, you shouldn't want to interact with the BIOS at all--unless you're writing an operating system or bootloader.

Since you're specifically asking about C functions like printf(), what I'll provide here is a little trace I did to find out "where the rubber meets the road" for GNU's libc. Spoiler alert: it ends at syscall().

System calls aren't the BIOS, but just a table of numbered functions with expected parameters the OS has for performing basic services. In a way it's similar, in the sense of "invoking something by number that is an agreed upon convention with some parameters". Though that's kind of what all software is, so we should probably emphasize the difference: you're talking to the OS, not the hardware in real mode.

So here's a delve specifically into GCC's printf...for those who are not easily bored:

First Steps

We’ll of course start with the prototype for printf, which is defined in the file libc/libio/stdio.h

extern int printf (__const char *__restrict __format, ...);

You won’t find the source code for a function called printf, however. Instead, in the file /libc/stdio-common/printf.c you’ll find a little bit of code associated with a function called __printf:

int __printf (const char *format, ...)
{
    va_list arg;
    int done;

    va_start (arg, format);
    done = vfprintf (stdout, format, arg);
    va_end (arg);

    return done;
}

A macro in the same file sets up an association so that this function is defined as an alias for the non-underscored printf:

ldbl_strong_alias (__printf, printf);

It makes sense that printf would be a thin layer that calls vfprintf with stdout. Indeed, the meat of the formatting work is done in vfprintf, which you’ll find in libc/stdio-common/vfprintf.c. It’s quite a lengthy function, but you can see that it’s still all in C!

Deeper Down the Rabbit Hole…

vfprintf mysteriously calls outchar and outstring, which are weird macros defined in the same file:

#define outchar(Ch) \
   do \
   { \
       register const INT_T outc = (Ch); \
       if (PUTC (outc, s) == EOF || done == INT_MAX) \
       { \
            done = -1; \
            goto all_done; \
       } \
       ++done; \
   } \
   while (0)

Sidestepping the question of why it’s so weird, we see that it’s dependent on the enigmatic PUTC, also in the same file:

#define PUTC(C, F) IO_putwc_unlocked (C, F)

When you get to the definition of IO_putwc_unlocked in libc/libio/libio.h, you might start thinking that you no longer care how printf works:

#define _IO_putwc_unlocked(_wch, _fp) \
   (_IO_BE ((_fp)->_wide_data->_IO_write_ptr \
        >= (_fp)->_wide_data->_IO_write_end, 0) \
        ? __woverflow (_fp, _wch) \
        : (_IO_wint_t) (*(_fp)->_wide_data->_IO_write_ptr++ = (_wch)))

But despite being a little hard to read, it’s just doing buffered output. If there’s enough room in the file pointer’s buffer, then it will just stick the character into it… but if not, it calls __woverflow. Since the only option when you’ve run out of buffer is to flush to the screen (or whatever device your file pointer represents), we can hope to find the magic incantation there.

Vtables in C?

If you guessed that we’re going to hop through another frustrating level of indirection, you’d be right. Look in libc/libio/wgenops.c and you’ll find the definition of __woverflow:

wint_t 
__woverflow (f, wch)
    _IO_FILE *f;
    wint_t wch;
{
    if (f->_mode == 0)
        _IO_fwide (f, 1);
    return _IO_OVERFLOW (f, wch);
}

Basically, file pointers are implemented in the GNU standard library as objects. They have data members but also function members which you can call with variations of the JUMP macro. In the file libc/libio/libioP.h you’ll find a little documentation of this technique:

/* THE JUMPTABLE FUNCTIONS.

 * The _IO_FILE type is used to implement the FILE type in GNU libc,
 * as well as the streambuf class in GNU iostreams for C++.
 * These are all the same, just used differently.
 * An _IO_FILE (or FILE) object is allows followed by a pointer to
 * a jump table (of pointers to functions).  The pointer is accessed
 * with the _IO_JUMPS macro.  The jump table has a eccentric format,
 * so as to be compatible with the layout of a C++ virtual function table.
 * (as implemented by g++).  When a pointer to a streambuf object is
 * coerced to an (_IO_FILE*), then _IO_JUMPS on the result just
 * happens to point to the virtual function table of the streambuf.
 * Thus the _IO_JUMPS function table used for C stdio/libio does
 * double duty as the virtual function table for C++ streambuf.
 *
 * The entries in the _IO_JUMPS function table (and hence also the
 * virtual functions of a streambuf) are described below.
 * The first parameter of each function entry is the _IO_FILE/streambuf
 * object being acted on (i.e. the 'this' parameter).
 */

So when we find IO_OVERFLOW in libc/libio/genops.c, we find it’s a macro which calls a “1-parameter” __overflow method on the file pointer:

#define IO_OVERFLOW(FP, CH) JUMP1 (__overflow, FP, CH)

The jump tables for the various file pointer types are in libc/libio/fileops.c

const struct _IO_jump_t _IO_file_jumps =
{
  JUMP_INIT_DUMMY,
  JUMP_INIT(finish, INTUSE(_IO_file_finish)),
  JUMP_INIT(overflow, INTUSE(_IO_file_overflow)),
  JUMP_INIT(underflow, INTUSE(_IO_file_underflow)),
  JUMP_INIT(uflow, INTUSE(_IO_default_uflow)),
  JUMP_INIT(pbackfail, INTUSE(_IO_default_pbackfail)),
  JUMP_INIT(xsputn, INTUSE(_IO_file_xsputn)),
  JUMP_INIT(xsgetn, INTUSE(_IO_file_xsgetn)),
  JUMP_INIT(seekoff, _IO_new_file_seekoff),
  JUMP_INIT(seekpos, _IO_default_seekpos),
  JUMP_INIT(setbuf, _IO_new_file_setbuf),
  JUMP_INIT(sync, _IO_new_file_sync),
  JUMP_INIT(doallocate, INTUSE(_IO_file_doallocate)),
  JUMP_INIT(read, INTUSE(_IO_file_read)),
  JUMP_INIT(write, _IO_new_file_write),
  JUMP_INIT(seek, INTUSE(_IO_file_seek)),
  JUMP_INIT(close, INTUSE(_IO_file_close)),
  JUMP_INIT(stat, INTUSE(_IO_file_stat)),
  JUMP_INIT(showmanyc, _IO_default_showmanyc),
  JUMP_INIT(imbue, _IO_default_imbue)
};
libc_hidden_data_def (_IO_file_jumps)

There’s also a #define which equates_IO_new_file_overflow with _IO_file_overflow, and the former is defined in the same source file. (Note: INTUSE is just a macro which marks functions that are for internal use, it doesn’t mean anything like “this function uses an interrupt”)

Are we there yet?!

The source code for _IO_new_file_overflow does a bunch more buffer manipulation, but it does call _IO_do_flush:

#define _IO_do_flush(_f) \
    INTUSE(_IO_do_write)(_f, (_f)->_IO_write_base, \
        (_f)->_IO_write_ptr-(_f)->_IO_write_base)

We’re now at a point where _IO_do_write is probably where the rubber actually meets the road: an unbuffered, actual, direct write to an I/O device. At least we can hope! It is mapped by a macro to _IO_new_do_write and we have this:

static
_IO_size_t
new_do_write (fp, data, to_do)
     _IO_FILE *fp;
     const char *data;
     _IO_size_t to_do;
{
  _IO_size_t count;
  if (fp->_flags & _IO_IS_APPENDING)
    /* On a system without a proper O_APPEND implementation,
       you would need to sys_seek(0, SEEK_END) here, but is
       is not needed nor desirable for Unix- or Posix-like systems.
       Instead, just indicate that offset (before and after) is
       unpredictable. */
    fp->_offset = _IO_pos_BAD;
  else if (fp->_IO_read_end != fp->_IO_write_base)
    {
      _IO_off64_t new_pos
    = _IO_SYSSEEK (fp, fp->_IO_write_base - fp->_IO_read_end, 1);
      if (new_pos == _IO_pos_BAD)
    return 0;
      fp->_offset = new_pos;
    }
  count = _IO_SYSWRITE (fp, data, to_do);
  if (fp->_cur_column && count)
    fp->_cur_column = INTUSE(_IO_adjust_column) (fp->_cur_column - 1, data,
                         count) + 1;
  _IO_setg (fp, fp->_IO_buf_base, fp->_IO_buf_base, fp->_IO_buf_base);
  fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_buf_base;
  fp->_IO_write_end = (fp->_mode <= 0
               && (fp->_flags & (_IO_LINE_BUF+_IO_UNBUFFERED))
               ? fp->_IO_buf_base : fp->_IO_buf_end);
  return count;
}

Sadly we’re stuck again… _IO_SYSWRITE is doing the work:

/* The 'syswrite' hook is used to write data from an existing buffer
   to an external file.  It generalizes the Unix write(2) function.
   It matches the streambuf::sys_write virtual function, which is
   specific to this implementation. */
typedef _IO_ssize_t (*_IO_write_t) (_IO_FILE *, const void *, _IO_ssize_t);
#define _IO_SYSWRITE(FP, DATA, LEN) JUMP2 (__write, FP, DATA, LEN)
#define _IO_WSYSWRITE(FP, DATA, LEN) WJUMP2 (__write, FP, DATA, LEN)

So inside of the do_write we call the write method on the file pointer. We know from our jump table above that is mapped to _IO_new_file_write, so what’s that do?

_IO_ssize_t
_IO_new_file_write (f, data, n)
     _IO_FILE *f;
     const void *data;
     _IO_ssize_t n;
{
  _IO_ssize_t to_do = n;
  while (to_do > 0)
    {
      _IO_ssize_t count = (__builtin_expect (f->_flags2
                         & _IO_FLAGS2_NOTCANCEL, 0)
               ? write_not_cancel (f->_fileno, data, to_do)
               : write (f->_fileno, data, to_do));
      if (count < 0)
    {
      f->_flags |= _IO_ERR_SEEN;
      break;
        }
      to_do -= count;
      data = (void *) ((char *) data + count);
    }
  n -= to_do;
  if (f->_offset >= 0)
    f->_offset += n;
  return n;
}

Now it just calls write! Well where is the implementation for that? You’ll find write in libc/posix/unistd.h:

/* Write N bytes of BUF to FD.  Return the number written, or -1.

   This function is a cancellation point and therefore not marked with
   __THROW.  */
extern ssize_t write (int __fd, __const void *__buf, size_t __n) __wur;

(Note: __wur is a macro for __attribute__ ((__warn_unused_result__)))

Functions Generated From a Table

That’s only a prototype for write. You won’t find a write.c file for Linux in the GNU standard library. Instead, you’ll find platform-specific methods of connecting to the OS write function in various ways, all in the libc/sysdeps/ directory.

We’ll keep following along with how Linux does it. There is a file called sysdeps/unix/syscalls.list which is used to generate the write function automatically. The relevant data from the table is:

File name: write
Caller: “-” (i.e. Not Applicable)
Syscall name: write
Args: Ci:ibn
Strong name: __libc_write
Weak names: __write, write

Not all that mysterious, except for the Ci:ibn. The C means “cancellable”. The colon separates the return type from the argument types, and if you want a deeper explanation of what they mean then you can see the comment in the shell script which generates the code, libc/sysdeps/unix/make-syscalls.sh.

So now we’re expecting to be able to link against a function called __libc_write which is generated by this shell script. But what’s being generated? Some C code which implements write via a macro called SYS_ify, which you’ll find in sysdeps/unix/sysdep.h

#define SYS_ify(syscall_name) __NR_##syscall_name

Ah, good old token-pasting :P. So basically, the implementation of this __libc_write becomes nothing more than a proxy invocation of the syscall function with a parameter named __NR_write, and the other arguments.

Where The Sidewalk Ends…

I know this has been a fascinating journey, but now we’re at the end of GNU libc. That number __NR_write is defined by Linux. For 32-bit X86 architectures it will get you to linux/arch/x86/include/asm/unistd_32.h:

#define __NR_write 4

The only thing left to look at, then, is the implementation of syscall. Which I may do at some point, but for now I’ll just point you over to some references for how to add a system call to Linux.

First, you have to understand the concept of rings.
A kernel runs in ring 0, meaning it has a full access to memory and opcodes.
A program runs usually in ring 3. It has a limited access to memory, and cannot use all the opcodes.

So when a software need more privileges (for opening a file, writing to a file, allocating memory, etc), it needs to asks the kernel.
This can be done in many ways. Software interrupts, SYSENTER, etc.

Let's take the example of software interrupts, with the printf() function:
1 - Your software calls printf().
2 - printf() processes your string, and args, and then needs to execute a kernel function, as writing to a file can't be done in ring 3.
3 - printf() generates a software interrupt, placing in a register the number of a kernel function (in that case, the write() function).
4 - The software execution is interrupted, and the instruction pointer moves to the kernel code. So we are now in ring 0, in a kernel function.
5 - The kernel process the request, writing to the file (stdout is a file descriptor).
6 - When done, the kernel returns to the software's code, using the iret instruction.
7 - The software's code continues.

So functions of the C standard library can be implemented in C. All it has to do is to know how to call the kernel when it need more privileges.

In Linux, strace utility allows you to see what system calls are made by a program. So, taking a program like this


    int main(){
    printf("x");
    return 0;
    }

Say, you compile it as printx, then strace printx gives


    execve("./printx", ["./printx"], [/* 49 vars */]) = 0
    brk(0)                                  = 0xb66000
    access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
    mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa6dc0e5000
    access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=119796, ...}) = 0
    mmap(NULL, 119796, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa6dc0c7000
    close(3)                                = 0
    access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
    open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
    read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\30\2\0\0\0\0\0"..., 832) = 832
    fstat(3, {st_mode=S_IFREG|0755, st_size=1811128, ...}) = 0
    mmap(NULL, 3925208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa6dbb06000
    mprotect(0x7fa6dbcbb000, 2093056, PROT_NONE) = 0
    mmap(0x7fa6dbeba000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b4000) = 0x7fa6dbeba000
    mmap(0x7fa6dbec0000, 17624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa6dbec0000
    close(3)                                = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa6dc0c6000
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa6dc0c5000
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa6dc0c4000
    arch_prctl(ARCH_SET_FS, 0x7fa6dc0c5700) = 0
    mprotect(0x7fa6dbeba000, 16384, PROT_READ) = 0
    mprotect(0x600000, 4096, PROT_READ)     = 0
    mprotect(0x7fa6dc0e7000, 4096, PROT_READ) = 0
    munmap(0x7fa6dc0c7000, 119796)          = 0
    fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa6dc0e4000
    write(1, "x", 1x)                        = 1
    exit_group(0)                           = ?

The rubber meets the road (sort off, see below) in the next to last call of the trace: write(1,"x",1x). At this point the control passes from user-land printx to the Linux kernel which handles the rest. write() is a wrapper function declared in unistd.h


    extern ssize_t write (int __fd, __const void *__buf, size_t __n) __wur;

Most system calls are wrapped in this way. The wrapper function, as its name suggests, is little more than a thin code layer that places the arguments in the correct registers and then executes a software interrupt 0x80. The kernel traps the interrupt and the rest is history. Or at least that's the way it used to work. Apparently, the overhead of interrupt trapping was quite high and, as an earlier post pointed out, modern CPU architectures introduced sysenter assembly instruction, which accomplishes the same result at speed. This page System Calls has quite a nice summary of how system calls work.

I feel that you will probably be a bit disappointed with this answer, as was I. Clearly, in some sense, this is a false bottom as there are still quite a few things that have to happen between the call to write() and the point at which the graphics card frame buffer is actually modified to make the letter "x" appear on your screen. Zooming in on the point of contact (to stay with the "rubber against the road" analogy) by diving into the kernel is sure to be educational if a time consuming endeavor. I am guessing you would have to travel through several layers of abstraction like buffered output streams, character devices, etc. Be sure to post the results should you decide to follow up on this:)

The standard library functions are implemented on an underlying platform library (e.g. UNIX API) and/or by direct system calls (that are still C functions). The system calls are (on platforms that I know of) internally implemented by a call to a function with inline asm that puts a system call number and parameters in CPU registers and triggers an interrupt that the kernel then processes.

There are also other ways of communicating with hardware besides syscalls, but these are usually unavailable or rather limited when running under a modern operating system, or at least enabling them requires some syscalls. A device may be memory mapped, so that writes to certain memory addresses (via regular pointers) control the device. I/O ports are also often used and depending the architecture these are accessed by special CPU opcodes or they, too, may be memory mapped to specific addresses.

Well, all C++ statements except the semicolon and comments end up becoming machine code that tells CPU what to do. You can write your own printf function without resorting to assembly. The only operations that must be written in assembly are input and output from ports, and things that enable and disable interrupts.

However, assembly is still used in system level programming for performance reasons. Even though inline assembly is not supported, there is nothing that prevents you from writing a separate module in assembly and linking it to your application.

In general, library function are precompiled and distribute ad object. Inline assembler is used only in particular situation for performance reasons, but it's the exception, not the rule. Actually, printf doesn't seems to me a good candidate to be inline-assembled. Insetad, functions like memcpy, or memcmp. Very low-level functions may be compiled by a native assembler (masm? gnu asm?), and distribute as object in a library.

The compiler generates the assembly from the C/C++ source code.

来源：https://stackoverflow.com/questions/2442966/c-c-function-definitions-without-assembly

标签

c++

inline-assembly