I\'m currently profiling an application with performance problems using Valgrind\'s \"Callgrind\". In looking at the profiling data, it appears that a good 25% of processing tim
thread_specific_ptr uses pthread_setspecific/pthread_getspecific for POSIX systems which is not the fastest possible.
If you are on a POSIX system, you can use the __thread storage specifier. However, it can only be used with initializers that are constant expressions e.g gcc's __thread
For Windows, a similar specifier is _declspec(thread).