How to prevent GCC from optimizing out a busy wait loop?

馋奶兔 提交于 2019-11-26 15:59:31
Denilson Sá Maia

I developed this answer after following a link from dmckee's answer, but it takes a different approach than his/her answer.

Function Attributes documentation from GCC mentions:

noinline This function attribute prevents a function from being considered for inlining. If the function does not have side-effects, there are optimizations other than inlining that causes function calls to be optimized away, although the function call is live. To keep such calls from being optimized away, put asm ("");

This gave me an interesting idea... Instead of adding a nop instruction at the inner loop, I tried adding an empty assembly code in there, like this:

unsigned char i, j;
j = 0;
while(--j) {
    i = 0;
    while(--i)
        asm("");
}

And it worked! That loop has not been optimized-out, and no extra nop instructions were inserted.

What's more, if you use volatile, gcc will store those variables in RAM and add a bunch of ldd and std to copy them to temporary registers. This approach, on the other hand, doesn't use volatile and generates no such overhead.


Update: If you are compiling code using -ansi or -std, you must replace the asm keyword with __asm__, as described in GCC documentation.

In addition, you can also use __asm__ __volatile__("") if your assembly statement must execute where we put it, (i.e. must not be moved out of a loop as an optimization).

Declare i and j variables as volatile. This will prevent compiler to optimize code involving these variables.

unsigned volatile char i, j;

I'm not sure why it hasn't been mentioned yet that this approach is completely misguided and easily broken by compiler upgrades, etc. It would make a lot more sense to determine the time value you want to wait until and spin polling the current time until the desired value is exceeded. On x86 you could use rdtsc for this purpose, but the more portable way would be to call clock_gettime (or the variant for your non-POSIX OS) to get the time. Current x86_64 Linux will even avoid the syscall for clock_gettime and use rdtsc internally. Or, if you can handle the cost of a syscall, just use clock_nanosleep to begin with...

I don't know off the top of my head if the avr version of the compiler supports the full set of #pragmas (the interesting ones in the link all date from gcc version 4.4), but that is where you would usually start.

For me, on GCC 4.7.0, empty asm was optimized away anyways with -O3 (didnt try with -O2). and using a i++ in register or volatile resulted in a big performance penalty (in my case).

What i did was linking with another empty function which the compiler couldnt see when compiling the "main program"

Basically this:

Created "helper.c" with this function declared (empty function)

void donotoptimize(){}

Then compiled "gcc helper.c -c -o helper.o" and then

while (...) { donotoptimize();}

This gave me best results (and from my belief, no overhead at all, but can't test because my program won't work without it :) )

I think it should work with icc too. Maybe not if you enable linking optimizations, but with gcc it does.

put that loop in a separate .c file and do not optimize that one file. Even better write that routine in assembler and call it from C, either way the optimizer wont get involved.

I sometimes do the volatile thing but normally create an asm function that simply returns put a call to that function the optimizer will make the for/while loop tight but it wont optimize it out because it has to make all the calls to the dummy function. The nop answer from Denilson Sá does the same thing but even tighter...

Putting volatile asm should help. You can read more on this here:-

http://www.nongnu.org/avr-libc/user-manual/optimization.html

If you are working on Windows, you can even try putting the code under pragmas, as explained in detail below:-

https://www.securecoding.cert.org/confluence/display/cplusplus/MSC06-CPP.+Be+aware+of+compiler+optimization+when+dealing+with+sensitive+data

Hope this helps.

You can also use the register keyword. Variables declared with register are stored in CPU registers.

In your case:

register unsigned char i, j;
j = 0;
while(--j) {
    i = 0;
    while(--i);
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!