I\'m trying around on the new C++11 threads, but my simple test has abysmal multicore performance. As a simple example, this program adds up some squared random numbers.
As you discovered, rand is the culprit here.
For those who are curious, it's possible that this behavior comes from your implementation of rand using a mutex for thread safety.
For example, eglibc defines rand in terms of __random, which is defined as:
long int
__random ()
{
int32_t retval;
__libc_lock_lock (lock);
(void) __random_r (&unsafe_state, &retval);
__libc_lock_unlock (lock);
return retval;
}
This kind of locking would force multiple threads to run serially, resulting in lower performance.