How does the gcc `__thread` work?

前端 未结 2 1779
情深已故
情深已故 2020-12-08 07:22

How is __thread in gcc implemented? Is it simply a wrapper over pthread_getspecific and pthread_setspecific?

With my program t

相关标签:
2条回答
  • 2020-12-08 08:00

    gcc's __thread has exactly the same semantic as C11's _Thread_local. You don't tell us what platform you are programming for as the implementation details vary between platforms. For example, on x86 Linux, gcc should compile access to thread local variables as memory instructions with a %fs segment prefix instead of invoking pthread_getspecific.

    0 讨论(0)
  • 2020-12-08 08:06

    Recent GCC, e.g. GCC 5 do support C11 and its thread_local (if compiling with e.g. gcc -std=c11). As FUZxxl commented, you could use (instead of C11 thread_local) the __thread qualifier supported by older GCC versions. Read about Thread Local Storage.

    pthread_getspecific is indeed quite slow (it is in the POSIX library, so is not provided by GCC but e.g. by GNU glibc or musl-libc) since it involves a function call. Using thread_local variables will very probably be faster.

    Look into the source code of MUSL's thread/pthread_getspecific.c file for an example of implementation. Read this answer to a related question.

    And _thread & thread_local are (often) not magically translated to calls to pthread_getspecific. They usually involve some specific address mode and/or register (details are implementation specific, related to the ABI; on Linux, I guess that since x86-64 has more registers & address modes, its implementation of TLS is faster than on i386), with help from the compiler, the linker and the runtime system. It could happen on the contrary that some implementations of pthread_getspecific are using some internal thread_local variables (in your implementation of POSIX threads).

    As an example, compiling the following code

    #include <pthread.h>
    
    const extern pthread_key_t key;
    
    __thread int data;
    
    int
    get_data (void) {
      return data;
    }
    
    int
    get_by_key (void) {
      return *(int*) (pthread_getspecific (key));
    }
    

    using GCC 5.2 (on Debian/Sid) with gcc -m32 -S -O2 -fverbose-asm gives the following code for get_data using TLS:

      .type get_data, @function
    get_data:
    .LFB3:
      .cfi_startproc
      movl  %gs:data@ntpoff, %eax   # data,
      ret
    .cfi_endproc
    

    and the following code of get_by_key with an explicit call to pthread_getspecific:

    get_by_key:
     .LFB4:
      .cfi_startproc
      subl  $24, %esp   #,
      .cfi_def_cfa_offset 28
      pushl key # key
      .cfi_def_cfa_offset 32
      call  pthread_getspecific #
      movl  (%eax), %eax    # MEM[(int *)_4], MEM[(int *)_4]
      addl  $28, %esp   #,
      .cfi_def_cfa_offset 4
      ret
      .cfi_endproc
    

    Hence using TLS with __thread (or thread_local in C11) should probably be faster than using pthread_getspecific (avoiding the overhead of a call).

    Notice that thread_local is a convenience macro defined in <threads.h> (a C11 standard header).

    0 讨论(0)
提交回复
热议问题