How to understand macro `likely` affecting branch prediction?

问题

I noticed if we know there is good chance for control flow is true or false, we can tell it to compiler, for instance, in Linux kernel, there are lots of likely unlikely, actually impled by __builtin_expect provided by gcc, so I want to find out how does it work, then checked the assembly out there:

  20:branch_prediction_victim.cpp ****             if (array_aka[j] >= 128)
 184                            .loc 3 20 0 is_stmt 1
 185 00f1 488B85D0              movq    -131120(%rbp), %rax
 185      FFFDFF
 186 00f8 8B8485F0              movl    -131088(%rbp,%rax,4), %eax
 186      FFFDFF
 187 00ff 83F87F                cmpl    $127, %eax
 188 0102 7E17                  jle     .L13

Then for __builtin_expect

  20:branch_prediction_victim.cpp ****             if (__builtin_expect((array_aka[j] >= 128), 1))
 184                            .loc 3 20 0 is_stmt 1
 185 00f1 488B85D0              movq    -131120(%rbp), %rax
 185      FFFDFF
 186 00f8 8B8485F0              movl    -131088(%rbp,%rax,4), %eax
 186      FFFDFF
 187 00ff 83F87F                cmpl    $127, %eax
 188 0102 0F9FC0                setg    %al
 189 0105 0FB6C0                movzbl  %al, %eax
 190 0108 4885C0                testq   %rax, %rax
 191 010b 7417                  je      .L13

188 - setg set if greater, here set if greater than what?
189 - movzbl move zero extend byte to long, I know this one move %al to %eax
190 - testq bitwise OR then set ZF CF flags, is this right?

I want to know how do they affect branch prediction, and improve performance, three extra instruction, more cycles needed right?

回答1:

setcc reads FLAGS, in this case set by the cmp right before. Read the manual.

This looks like you forgot to enable optimization, so __builtin_expect is just creating a 0 / 1 boolean value in a register and branching on it being non-zero, instead of branching on the original FLAGS condition. Don't look at un-optimized code, it's always going to suck.

The clues are: the braindead booleanizing as part of likely, and loading j from the stack using RBP as a frame pointer with movq -131120(%rbp), %rax

likely generally doesn't improve runtime branch prediction, it improves code layout to minimize the amount of taken branches when things go the way the source code said they would (i.e. the fast case). So it improves I-cache locality for the common case. e.g. the compiler will lay things out so the common case is a not-taken conditional branch, just falling through. This makes things easier for the front-end in superscalar pipelined CPUs that fetch/decode multiple instructions at once. Continuing to fetch in a straight line is easiest.

likely can actually get the compiler to use a branch instead of a cmov for cases that you know are predictable, even if compiler heuristics (without profile-guided optimization) would have gotten it wrong. Related: gcc optimization flag -O3 makes code slower than -O2

来源：https://stackoverflow.com/questions/61030543/how-to-understand-macro-likely-affecting-branch-prediction

标签

performance

assembly

x86

branch-prediction