How do I stop GCC from optimizing this byte-for-byte copy into a memcpy call?

江枫思渺然 提交于 2019-12-05 21:41:00

问题


I have this code for memcpy as part of my implementation of the standard C library which copies memory from src to dest one byte at a time:

void *memcpy(void *restrict dest, const void *restrict src, size_t len)
{
    char *dp = (char *restrict)dest;
    const char *sp = (const char *restrict)src;

    while( len-- )
    {
        *dp++ = *sp++;
    }

    return dest;
}

With gcc -O2, the code generated is reasonable:

memcpy:
.LFB0:
        movq    %rdi, %rax
        testq   %rdx, %rdx
        je      .L2
        xorl    %ecx, %ecx
.L3:
        movzbl  (%rsi,%rcx), %r8d
        movb    %r8b, (%rax,%rcx)
        addq    $1, %rcx
        cmpq    %rdx, %rcx
        jne     .L3
.L2:
        ret
.LFE0:

However, at gcc -O3, GCC optimizes this naive byte-for-byte copy into a memcpy call:

memcpy:
.LFB0:
        testq   %rdx, %rdx
        je      .L7
        subq    $8, %rsp
        call    memcpy
        addq    $8, %rsp
        ret
.L7:
        movq    %rdi, %rax
        ret
.LFE0:

This won't work (memcpy unconditionally calls itself), and it causes a segfault.

I've tried passing -fno-builtin-memcpy and -fno-loop-optimizations, and the same thing occurs.

I'm using GCC version 8.3.0:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-cros-linux-gnu/8.3.0/lto-wrapper
Target: x86_64-cros-linux-gnu
Configured with: ../configure --prefix=/usr/local --libdir=/usr/local/lib64 --build=x86_64-cros-linux-gnu --host=x86_64-cros-linux-gnu --target=x86_64-cros-linux-gnu --enable-checking=release --disable-multilib --enable-threads=posix --disable-bootstrap --disable-werror --disable-libmpx --enable-static --enable-shared --program-suffix=-8.3.0 --with-arch-64=x86-64
Thread model: posix
gcc version 8.3.0 (GCC) 

How do I disable the optimization that causes the copy to be transformed into a memcpy call?


回答1:


One thing that seems to be sufficient here: instead of using -fno-builtin-memcpy use -fno-builtin for compiling the translation unit of memcpy alone!

An alternative would be to pass -fno-tree-loop-distribute-patterns; though this might be brittle as it forbids the compiler from reorganizing the loop code first and then replacing part of them with calls to mem* functions.

Or, since you cannot rely anything in the C library, perhaps using -ffreestanding could be in order.




回答2:


This won't work (memcpy unconditionally calls itself), and it causes a segfault.

Redefining memcpy is undefined behavior.

How do I disable the optimization that causes the copy to be transformed into a memcpy call (preferably while still compiling with -O3)?

Don't. The best approach is fixing your code instead:

  • In most cases, you should use another name.

  • In the rare case you are really implementing a C library (as discussed in the comments), and you really want to reimplement memcpy, then you should be using compiler-specific options to achieve that. For GCC, see -fno-builtin* and -ffreestanding, as well as -nodefaultlibs and -nostdlib.



来源:https://stackoverflow.com/questions/57539509/how-do-i-stop-gcc-from-optimizing-this-byte-for-byte-copy-into-a-memcpy-call

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!