Is it normal that the gcc atomic builtins are so slow?
I have an application where I have to increment some statistics counters in a multi-threaded method. The incrementing has to be thread-safe, so I decided to use the gcc atomic builtins __sync_add_and_fetch() function. Just to get an idea of their impact, I did some simple performance testing and noticed that these functions are much slower than simple pre/post incrementing. Here is the test program that I created: #include <iostream> #include <pthread.h> #include <time.h> using namespace std; uint64_t diffTimes(struct timespec &start, struct timespec &end) { if(start.tv_sec == end.tv_sec) {