Specifically, if I have a series of if...else if statements, and I somehow know beforehand the relative probability that each statement will evalua
Just my 5 cents. It seems the effect of ordering if statements should depend on:
Probability of each if statement.
Number of iterations, so the branch predictor could kick in.
Likely/unlikely compiler hints, i.e. code layout.
To explore those factors, I benchmarked the following functions:
for (i = 0; i < data_sz * 1024; i++) {
if (data[i] < check_point) // highly likely
s += 3;
else if (data[i] > check_point) // samewhat likely
s += 2;
else if (data[i] == check_point) // very unlikely
s += 1;
}
for (i = 0; i < data_sz * 1024; i++) {
if (data[i] == check_point) // very unlikely
s += 1;
else if (data[i] > check_point) // samewhat likely
s += 2;
else if (data[i] < check_point) // highly likely
s += 3;
}
for (i = 0; i < data_sz * 1024; i++) {
if (likely(data[i] < check_point)) // highly likely
s += 3;
else if (data[i] > check_point) // samewhat likely
s += 2;
else if (unlikely(data[i] == check_point)) // very unlikely
s += 1;
}
for (i = 0; i < data_sz * 1024; i++) {
if (unlikely(data[i] == check_point)) // very unlikely
s += 1;
else if (data[i] > check_point) // samewhat likely
s += 2;
else if (likely(data[i] < check_point)) // highly likely
s += 3;
}
The data array contains random numbers between 0 and 100:
const int RANGE_MAX = 100;
uint8_t data[DATA_MAX * 1024];
static void data_init(int data_sz)
{
int i;
srand(0);
for (i = 0; i < data_sz * 1024; i++)
data[i] = rand() % RANGE_MAX;
}
The following results are for Intel i5@3,2 GHz and G++ 6.3.0. The first argument is the check_point (i.e. probability in %% for the highly likely if statement), the second argument is data_sz (i.e. number of iterations).
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4 4660 ns 4658 ns 150948
ordered_ifs/50/8 25636 ns 25635 ns 27852
ordered_ifs/75/4 4326 ns 4325 ns 162613
ordered_ifs/75/8 18242 ns 18242 ns 37931
ordered_ifs/100/4 1673 ns 1673 ns 417073
ordered_ifs/100/8 3381 ns 3381 ns 207612
reversed_ifs/50/4 5342 ns 5341 ns 126800
reversed_ifs/50/8 26050 ns 26050 ns 26894
reversed_ifs/75/4 3616 ns 3616 ns 193130
reversed_ifs/75/8 15697 ns 15696 ns 44618
reversed_ifs/100/4 3738 ns 3738 ns 188087
reversed_ifs/100/8 7476 ns 7476 ns 93752
ordered_ifs_with_hints/50/4 5551 ns 5551 ns 125160
ordered_ifs_with_hints/50/8 23191 ns 23190 ns 30028
ordered_ifs_with_hints/75/4 3165 ns 3165 ns 218492
ordered_ifs_with_hints/75/8 13785 ns 13785 ns 50574
ordered_ifs_with_hints/100/4 1575 ns 1575 ns 437687
ordered_ifs_with_hints/100/8 3130 ns 3130 ns 221205
reversed_ifs_with_hints/50/4 6573 ns 6572 ns 105629
reversed_ifs_with_hints/50/8 27351 ns 27351 ns 25568
reversed_ifs_with_hints/75/4 3537 ns 3537 ns 197470
reversed_ifs_with_hints/75/8 16130 ns 16130 ns 43279
reversed_ifs_with_hints/100/4 3737 ns 3737 ns 187583
reversed_ifs_with_hints/100/8 7446 ns 7446 ns 93782
For 4K iterations and (almost) 100% probability of highly liked statement the difference is huge 223%:
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
ordered_ifs/100/4 1673 ns 1673 ns 417073
reversed_ifs/100/4 3738 ns 3738 ns 188087
For 4K iterations and 50% probability of highly liked statement the difference is about 14%:
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4 4660 ns 4658 ns 150948
reversed_ifs/50/4 5342 ns 5341 ns 126800
The difference between 4K and 8K iterations for (almost) 100% probability of highly liked statement is about two times (as expected):
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
ordered_ifs/100/4 1673 ns 1673 ns 417073
ordered_ifs/100/8 3381 ns 3381 ns 207612
But the difference between 4K and 8K iterations for 50% probability of highly liked statement is 5,5 times:
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4 4660 ns 4658 ns 150948
ordered_ifs/50/8 25636 ns 25635 ns 27852
Why is so? Because of branch predictor misses. Here is the branch misses for each mentioned above case:
ordered_ifs/100/4 0.01% of branch-misses
ordered_ifs/100/8 0.01% of branch-misses
ordered_ifs/50/4 3.18% of branch-misses
ordered_ifs/50/8 15.22% of branch-misses
So on my i5 the branch predictor fails spectacularly for not-so-likely branches and large data sets.
For 4K iterations the results are somewhat worse for 50% probability and somewhat better for close to 100% probability:
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4 4660 ns 4658 ns 150948
ordered_ifs/100/4 1673 ns 1673 ns 417073
ordered_ifs_with_hints/50/4 5551 ns 5551 ns 125160
ordered_ifs_with_hints/100/4 1575 ns 1575 ns 437687
But for 8K iterations the results are always a bit better:
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/8 25636 ns 25635 ns 27852
ordered_ifs/100/8 3381 ns 3381 ns 207612
ordered_ifs_with_hints/50/8 23191 ns 23190 ns 30028
ordered_ifs_with_hints/100/8 3130 ns 3130 ns 221205
So, the hints also help, but just a tiny bit.
Overall conclusion is: always benchmark the code, because the results may surprise.
Hope that helps.