thread-local-storage

What are the real ELF TLS ABI requirements for each cpu arch?

时间秒杀一切 提交于 2019-12-04 00:46:20
问题 Ulrich Drepper's paper on thread-local storage outlines the TLS ABI for several different cpu architectures, but I'm finding it insufficient as a basis for implementing TLS for two reasons: It omits a number of important archs like ARM, MIPS, etc. (while including a bunch of completely-irrelevant ones like Itanium) More importantly, it mixes a lot of implementation details with ABI, so that it's hard to tell which properties are required for interoperability, and which are just aspects of his

Does Go have something like ThreadLocal from Java?

自作多情 提交于 2019-12-04 00:05:32
I use Go and Gin to setup my website and want to know the database access time. I use goroutine so if don't use something like thread-local, I must change almost every function to do it. Does Go have a good way to do it? The Go runtime and standard libraries do not provide goroutine local storage or a goroutine identifier that can be used to implement goroutine local storage. The third party gls package implements goroutine local storage in an interesting way. Some find this package horrifying and others think it's clever. The Go team recommends passing context explicitly as function argument

Allocate intermediate multidimensional arrays in Cython without acquiring the GIL

邮差的信 提交于 2019-12-03 12:32:16
问题 I'm trying to use Cython to parallelize an expensive operation which involves generating intermediate multidimensional arrays. The following very simplified code illustrates the sort of thing I'm trying to do: import numpy as np cimport cython cimport numpy as np from cython.parallel cimport prange from libc.stdlib cimport malloc, free @cython.boundscheck(False) @cython.wraparound(False) def embarrasingly_parallel_example(char[:, :] A): cdef unsigned int m = A.shape[0] cdef unsigned int n = A

The Cost of thread_local

折月煮酒 提交于 2019-12-03 10:19:41
Now that C++ is adding thread_local storage as a language feature, I'm wondering a few things: What is the cost of thead_local likely to be? In memory? For read and write operations? Associated with that: how do Operating Systems usually implement this? It would seem like anything declared thread_local would have to be given thread-specific storage space for each thread created. Storage space: size of the variable * number of threads, or possibly (sizeof(var) + sizeof(var*)) * number of threads. There are two basic ways of implementing thread-local storage: Using some sort of system call that

Using ThreadLocal in instance variables

匆匆过客 提交于 2019-12-03 08:23:23
Do Java ThreadLocal variables produce thread-local values if they are used as instance variables (e.g., in a method that generates thread-local objects), or must they always be static to do so? As an example, assume a typical scenario where several, expensive to initialize objects of a class that is not thread-safe, need to be instantiated in a single static initialization block, stored in static variables of a single class (e.g., in a Map data structure) and from then on used for intensive processing by numerous different threads. To achieve thread safety, obviously a different copy of each

Allocate intermediate multidimensional arrays in Cython without acquiring the GIL

早过忘川 提交于 2019-12-03 02:52:32
I'm trying to use Cython to parallelize an expensive operation which involves generating intermediate multidimensional arrays. The following very simplified code illustrates the sort of thing I'm trying to do: import numpy as np cimport cython cimport numpy as np from cython.parallel cimport prange from libc.stdlib cimport malloc, free @cython.boundscheck(False) @cython.wraparound(False) def embarrasingly_parallel_example(char[:, :] A): cdef unsigned int m = A.shape[0] cdef unsigned int n = A.shape[1] cdef np.ndarray[np.float64_t, ndim = 2] out = np.empty((m, m), np.float64) cdef unsigned int

Is there a way I can persist context locals for sub-threads?

你说的曾经没有我的故事 提交于 2019-12-02 18:28:11
问题 Currently I create a library that records backend calls like ones made to boto3 and requests libraries, and then populates a global "data" object based on some data like the status code of responses, etc. I originally had the data object as global, but then I realized this was a bad idea because when the application is run in parallel, the data object is simultaneously modified (which would possibly corrupt it), however I want to keep this object separate for each invocation of my application

Why is thread local storage so slow?

ぃ、小莉子 提交于 2019-12-02 15:09:14
I'm working on a custom mark-release style memory allocator for the D programming language that works by allocating from thread-local regions. It seems that the thread local storage bottleneck is causing a huge (~50%) slowdown in allocating memory from these regions compared to an otherwise identical single threaded version of the code, even after designing my code to have only one TLS lookup per allocation/deallocation. This is based on allocating/freeing memory a large number of times in a loop, and I'm trying to figure out if it's an artifact of my benchmarking method. My understanding is

OpenMP and Thread Local Storage identifier with icc

霸气de小男生 提交于 2019-12-02 05:36:03
问题 This is a simple test code: #include <stdlib.h> __thread int a = 0; int main() { #pragma omp parallel default(none) { a = 1; } return 0; } gcc compiles this without any problems with -fopenmp , but icc (ICC) 12.0.2 20110112 with -openmp complains with test.c(7): error: "a" must be specified in a variable list at enclosing OpenMP parallel pragma #pragma omp parallel default(none) I have no clue which paradigm (i.e. shared , private , threadprivate ) applies to this type of variables. Which one

No luck compiling __thread using ndk clang 3.4/3.5

拈花ヽ惹草 提交于 2019-12-02 04:24:51
I am trying to use __thread in this small program without luck. Any idea if this TLS is supported in ndk 10c clang 3.4/3.5? The same program compiles fine with ndk gcc 4.8/4.9 and native clang/gcc compilers. Here is the program and compile line - __thread int counter; int main () { counter=20; return 0; } [armeabi] Compile++ thumb: test <= test.cpp /Users/padlar/android/android-ndk-r10c/toolchains/llvm-3.5/prebuilt/darwin-x86/bin/clang++ -MMD -MP -MF ./obj/local/armeabi/objs/test/test.o.d -gcc-toolchain /Users/padlar/android/android-ndk-r10c/toolchains/arm-linux-androideabi-4.8/prebuilt/darwin