GCC: putchar(char) in inline assembly

流过昼夜 提交于 2019-11-29 12:21:24

When using GNU C inline asm, use constraints to tell the compiler where you want things, instead of doing it "manually" with instructions inside the asm template.

For writechar and readchar, we only need a "syscall" as the template, with constraints to set up all the inputs in registers (and the pointed-to char in memory for the write(2) system call), according to the x86-64 Linux system-call convention (which very closely matches the System V ABI's function-calling convention). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

This also makes it easy to avoid clobbering the red-zone (128 bytes below RSP), where the compiler might be keeping values. You must not clobber it from inline asm (so push / pop aren't usable unless you sub rsp, 128 first: see Using base pointer register in C++ inline asm for that and many useful links about GNU C inline asm), and there's no way to tell the compiler you clobber it. You could build with -mno-redzone, but in this case input/output operands are much better.


I'm hesitant to call these putchar and getchar. You can do that if you're implementing your own stdio that doesn't support buffering yet, but some functions require input buffering to implement correctly. For example, scanf has to examine characters to see if they match the format string, and leave them "unread" if they don't. Output buffering is optional, though; you could I think fully implement stdio with functions that create a private buffer and write() it, or directly write() their input pointer.

writechar():

int writechar(char my_char)
{
    int retval;  // sys_write uses ssize_t, but we only pass len=1
                 // so the return value is either 1 on success or  -1..-4095 for error
                 // and thus fits in int

    asm volatile("syscall  #dummy arg picked %[dummy]\n"
                    : "=a" (retval)  /* output in EAX */
                    /* inputs: ssize_t read(int fd, const void *buf, size_t count); */
                    : "D"(1),         // RDI = fd=stdout
                      "S"(&my_char),  // RSI = buf
                      "d"(1)          // RDX = length
                      , [dummy]"m" (my_char) // dummy memory input, otherwise compiler doesn't store the arg
                    /* clobbered regs */
                    : "rcx", "r11"  // clobbered by syscall
                );
    // It doesn't matter what addressing mode "m"(my_char) picks,
    // as long as it refers to the same memory as &my_char so the compiler actually does a store

    return retval;
}

This compiles very efficiently with gcc -O3, on the Godbolt compiler explorer.

writechar:
    movb    %dil, -4(%rsp)        # store my_char into the red-zone
    movl    $1, %edi
    leaq    -4(%rsp), %rsi
    movl    %edi, %edx            # optimize because fd = len
    syscall               # dummy arg picked -4(%rsp)

    ret

@nrz's test main inlines it much more efficiently than the unsafe (red-zone clobbering) version in that answer, taking advantage of the fact that syscall leaves most registers unmodified so it can just set them once.

main:
    movl    $97, %r8d            # my_char = 'a'
    leaq    -1(%rsp), %rsi       # rsi = &my_char
    movl    $1, %edx             # len
.L6:                           # do {
    movb    %r8b, -1(%rsp)       # store the char into the buffer
    movl    %edx, %edi           # silly compiler doesn't hoist this out of the loop
    syscall  #dummy arg picked -1(%rsp)

    addl    $1, %r8d
    cmpb    $123, %r8b
    jne     .L6                # } while(++my_char < 'z'+1)

    movb    $10, -1(%rsp)
    syscall  #dummy arg picked -1(%rsp)

    xorl    %eax, %eax         # return 0
    ret

readchar(), done the same way:

int readchar(void)
{
    int retval;
    unsigned char my_char;
    asm volatile("syscall  #dummy arg picked %[dummy]\n"
                    /* outputs */
                    : "=a" (retval)
                     ,[dummy]"=m" (my_char) // tell the compiler the asm dereferences &my_char

                    /* inputs: ssize_t read(int fd, void *buf, size_t count); */
                    : "D"(0),         // RDI = fd=stdin
                      "S" (&my_char), // RDI = buf
                      "d"(1)          // RDX = length

                    : "rcx", "r11"  // clobbered by syscall
                );
    if (retval < 0)   // -1 .. -4095 are -errno values
        return retval;
    return my_char;   // else a 0..255 char / byte
}

Callers can check for error by checking c < 0.

Here's an example my_putchar in GCC x86-64 inline assembly (in Intel syntax, converting to AT&T should be trivial).

Compiles with:

gcc -ggdb -masm=intel -o gcc_asm_putchar gcc_asm_putchar.c

Edit: rdi was missing from clobbered registers. Fixed.

Here's the code:

int main(void)
{
    char my_char;

    for (my_char = 'a'; my_char <= 'z'; my_char++)
            my_putchar(my_char);

    my_char = '\n';
    my_putchar(my_char);
    return 0;
}

void my_putchar(char my_char)
{
    int dword_char;
    dword_char = (int)my_char;
    asm volatile(
                    ".intel_syntax noprefix;"
                    "mov r10,rsp;"   // save rsp.
                    "sub rsp,8;"     // space for buffer, align by 8.
                    "mov [rsp],al;"  // store the character into buffer.
                    "mov edi,1;"     // STDOUT.
                    "mov rsi,rsp;"   // pointer to buffer.
                    "mov edx,1;"     // string length in bytes.
                    "mov eax,1;"     // WRITE.
                    "syscall;"       // clobbers rcx & r11.
                    "mov rsp,r10;"   // restore rsp.
                    ".att_syntax prefix;"
                    /* outputs */
                    :
                    /* inputs: eax */
                    : "a"(dword_char)
                    /* clobbered regs */
                    : "rcx", "rdx", "rsi", "rdi", "r10", "r11"
                );
}

Note that getchar(3)/putchar(3) are macros (for performance) which mess around with complex data in the FILE structure for stdin/stdout, specifically handling buffering and other. The answer by nrz just does a 1 char write(3) to file descriptor 1, something very different.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!