From the GCC 4.8 draft changelog:
G++ now implements the C++11
thread_local
keyword; this differs from the GNU__thread
ke
If the variable is defined in the current TU, the inliner will take care of the overhead. I expect that this will be true of most uses of thread_local.
For extern variables, if the programmer can be sure that no use of the variable in a non-defining TU needs to trigger dynamic initialization (either because the variable is statically initialized, or a use of the variable in the defining TU will be executed before any uses in another TU), they can avoid this overhead with the -fno-extern-tls-init option.