A peculiar difference I have noted on gcc 5.2.1 and clang 3.6.2 is
that if you have a critical loop like:
for (;;) {
if (!visited) {
....
}
node++;
if (!*node) break;
}
Then gcc will, when compiling with -O3
or -O2
, speculatively
unroll the loop eight times. Clang will not unroll it at all. Through
trial and error I found that in my specific case with my program data,
the right amount of unrolling is five so gcc overshot and clang
undershot. However, overshooting was more detrimental to performance, so
gcc performed much worse here.
I have no idea if the unrolling difference is a general trend or
just something that was specific to my scenario.
A while back I wrote a few garbage
collectors to teach myself more
about performance optimization in C. And the results I got is in my
mind enough to slightly favor clang. Especially since garbage
collection is mostly about pointer chasing and copying memory.
The results are (numbers in seconds):
+---------------------+-----+-----+
|Type |GCC |Clang|
+---------------------+-----+-----+
|Copying GC |22.46|22.55|
|Copying GC, optimized|22.01|20.22|
|Mark & Sweep | 8.72| 8.38|
|Ref Counting/Cycles |15.14|14.49|
|Ref Counting/Plain | 9.94| 9.32|
+---------------------+-----+-----+
This is all pure C code, and I make no claim about either compiler's
performance when compiling C++ code.
On Ubuntu 15.10, x86.64, and an AMD Phenom(tm) II X6 1090T processor.