Basically speaking, the answer is: it depends.
There are many many benchmarks focusing on different kinds of application.
My benchmark on my app is: gcc > icc > clang.
There are rare IO, but many CPU float and data structure operations.
compile flags is -Wall -g -DNDEBUG -O3.
https://github.com/zhangyafeikimi/ml-pack/blob/master/gbdt/profile/benchmark