Near the start of my c++ application, my main thread uses OMP to parallelize several for loops. After the first parallelized for loop, I see that the threads used remain in existence for the duration of the application, and are reused for subsequent OMP for loops executed from the main thread, using the command (working in CentOS 7):
for i in $(pgrep myApplication); do ps -mo pid,tid,fname,user,psr -p $i;done
Later in my program, I launch a boost thread from the main thread, in which I parallelize a for loop using OMP. At this point, I see an entirely new set of threads are created, which has a decent amount of overhead.
Is it possible to make the OMP parallel for loop within the boost thread reuse the original OMP thread pool created by the main thread?
Edit: Some pseudo code:
myFun(data)
{
// Want to reuse OMP thread pool from main here.
omp parallel for
for(int i = 0; i < N; ++i)
{
// Work on data
}
}
main
{
// Thread pool created here.
omp parallel for
for(int i = 0; i < N; ++i)
{
// do stuff
}
boost::thread myThread(myFun) // Constructor starts thread.
// Do some serial stuff, no OMP.
myThread.join();
}
The interaction of OpenMP with other threading mechanisms is deliberately left out of the specification and is therefore dependent heavily on the implementation. The GNU OpenMP runtime keeps a pointer to the thread pool in TLS and propagates it down the (nested) teams. Threads started via pthread_create
(or boost::thread
or std::thread
) do not inherit the pointer and therefore spawn a fresh pool. It is probably the case with other OpenMP runtimes too.
There is a requirement in the standard that basically forces such behaviour in most implementations. It is about the semantics of the threadprivate variables and how their values are retained across the different parallel regions forked from the same thread (OpenMP standard, 2.15.2 threadprivate
Directive):
The values of data in the threadprivate variables of non-initial threads are guaranteed to persist between two consecutive active
parallel
regions only if all of the following conditions hold:
- Neither
parallel
region is nested inside another explicit parallel region.- The number of threads used to execute both
parallel
regions is the same.- The thread affinity policies used to execute both
parallel
regions are the same.- The value of the dyn-var internal control variable in the enclosing task region is false at entry to both
parallel
regions.If these conditions all hold, and if a threadprivate variable is referenced in both regions, then threads with the same thread number in their respective regions will reference the same copy of that variable.
This, besides performance, is probably the main reason for using thread pools in OpenMP runtimes.
Now, imagine that two parallel regions forked by two separate threads share the same worker thread pool. A parallel region was forked by the first thread and some threadprivate variables were set. Later a second parallel region is forked by the same thread, where those threadprivate variables are used. But somewhere between the two parallel regions, a parallel region is forked by the second thread and worker threads from the same pool are utilised. Since most implementations keep threadprivate variables in TLS, the above semantics can no longer be asserted. A possible solution would be to add new worker threads to the pool for each separate thread, which is not much different than creating new thread pools.
I'm not aware of any workarounds to make the worker thread pool shared. And if possible, it will not be portable, therefore the main benefit of OpenMP will be lost.
来源:https://stackoverflow.com/questions/38254882/how-to-reuse-omp-thread-pool-created-by-main-thread-in-worker-thread