Why may thread_local not be applied to non-static data members and how to implement thread-local non-static data members?

有些话、适合烂在心里 提交于 2019-12-10 14:36:34

问题


Why may thread_local not be applied to non-static data members? The accepted answer to this question says: "There is no point in making non-static structure or class members thread-local." Honestly, I see many good reasons to make non-static data members thread-local.

Assume we have some kind of ComputeEngine with a member function computeSomething that is called many times in succession. Some of the work inside the member function can be done in parallel. To do so, each thread needs some kind of ComputeHelper that provides, for example, auxiliary data structures. So what we actually want is the following:

class ComputeEngine {
 public:
  int computeSomething(Args args) {
    int sum = 0;
    #pragma omp parallel for reduction(+:sum)
    for (int i = 0; i < MAX; ++i) {
      // ...
      helper.xxx();
      // ...
    }
    return sum;
  }
 private:
  thread_local ComputeHelper helper;
};

Unfortunately, this code will not compile. What we could do instead is this:

class ComputeEngine {
 public:
  int computeSomething(Args args) {
    int sum = 0;
    #pragma omp parallel
    {
      ComputeHelper helper;
      #pragma omp for reduction(+:sum)
      for (int i = 0; i < MAX; ++i) {
        // ...
        helper.xxx();
        // ...
      }
    }
    return sum;
  }
};

However, this will construct and destruct the ComputeHelper between successive calls of computeSomething. Assuming that constructing the ComputeHelper is expensive (for example, due to the allocation und initialization of huge vectors), we may want to reuse the ComputeHelpers between successive calls. This leads me to the following boilerplate approach:

class ComputeEngine {
  struct ThreadLocalStorage {
    ComputeHelper helper;
  };
 public:
  int computeSomething(Args args) {
    int sum = 0;
    #pragma omp parallel
    {
      ComputeHelper &helper = tls[omp_get_thread_num()].helper;
      #pragma omp for reduction(+:sum)
      for (int i = 0; i < MAX; ++i) {
        // ...
        helper.xxx();
        // ...
      }
    }
    return sum;
  }
 private:
  std::vector<ThreadLocalStorage> tls;
};
  1. Why may thread_local not be applied to non-static data members? What is the rationale behind this restriction? Have I not given a good example where thread-local non-static data members make perfect sense?
  2. What are best practices to implement thread-local non-static data members?

回答1:


As for why thread_local cannot be applied to non-static data members, it would disrupt the usual ordering guarantee of such members. That is, data members within a single public/private/protected group must be laid out in memory in the same order as in the class declaration. Not to mention what happens if you allocate a class on the stack--the TLS members would not go on the stack.

As for how to work around this, I suggest using boost::thread_specific_ptr. You can put one of these inside your class and get the behavior you want.




回答2:


The way that thread local storage usually works is that you get exactly one pointer in a thread specific data structure (e.g. TEB in Windows

As long as all thread local variables are static, the compiler can easily compute the size of these fields, allocate a struct of the size and assign a static offset into that struct to each field.

As soon as you allow non static fields this whole scheme becomes way more complicated - one way to solve it would be one additional level of indirection and storing an index in each class (now you have hidden fields in classes, rather unexpected).

Instead of hoisting the complexity of such a scheme on the implementer, they apparently decided to let each application deal with it on a need basis.



来源:https://stackoverflow.com/questions/32365653/why-may-thread-local-not-be-applied-to-non-static-data-members-and-how-to-implem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!