I wrote something using atomics rather than locks and perplexed at it being so much slower in my case I wrote the following mini test:
#include <pthread.h> #include <vector> struct test { test(size_t size) : index_(0), size_(size), vec2_(size) { vec_.reserve(size_); pthread_mutexattr_init(&attrs_); pthread_mutexattr_setpshared(&attrs_, PTHREAD_PROCESS_PRIVATE); pthread_mutexattr_settype(&attrs_, PTHREAD_MUTEX_ADAPTIVE_NP); pthread_mutex_init(&lock_, &attrs_); } void lockedPush(int i); void atomicPush(int* i); size_t index_; size_t size_; std::vector<int> vec_; std::vector<int> vec2_; pthread_mutexattr_t attrs_; pthread_mutex_t lock_; }; void test::lockedPush(int i) { pthread_mutex_lock(&lock_); vec_.push_back(i); pthread_mutex_unlock(&lock_); } void test::atomicPush(int* i) { int ii = (int) (i - &vec2_.front()); size_t index = __sync_fetch_and_add(&index_, 1); vec2_[index & (size_ - 1)] = ii; } int main(int argc, char** argv) { const size_t N = 1048576; test t(N); // for (int i = 0; i < N; ++i) // t.lockedPush(i); for (int i = 0; i < N; ++i) t.atomicPush(&i); }
If I uncomment the atomicPush operation and run the test with time(1)
I get output like so:
real 0m0.027s user 0m0.022s sys 0m0.005s
and if I run the loop calling the atomic thing (the seemingly unnecessary operation is there because i want my function to look as much as possible as what my bigger code does) I get output like so:
real 0m0.046s user 0m0.043s sys 0m0.003s
I'm not sure why this is happening as I would have expected the atomic to be faster than the lock in this case...
When I compile with -O3 I see lock and atomic updates as follows:
lock: real 0m0.024s user 0m0.022s sys 0m0.001s atomic: real 0m0.013s user 0m0.011s sys 0m0.002s
In my larger app though the performance of the lock (single threaded testing) is still doing better regardless though..