OpenMP vs C++11 threads

匿名 (未验证) 提交于 2019-12-03 02:45:02

问题:

In the following example the C++11 threads take about 50 seconds to execute, but the OMP threads only 5 seconds. Any ideas why? (I can assure you it still holds true if you are doing real work instead of doNothing, or if you do it in a different order, etc.) I'm on a 16 core machine, too.

#include <iostream> #include <omp.h> #include <chrono> #include <vector> #include <thread>  using namespace std;  void doNothing() {}  int run(int algorithmToRun) {     auto startTime = std::chrono::system_clock::now();      for(int j=1; j<100000; ++j)     {         if(algorithmToRun == 1)         {             vector<thread> threads;             for(int i=0; i<16; i++)             {                 threads.push_back(thread(doNothing));             }             for(auto& thread : threads) thread.join();         }         else if(algorithmToRun == 2)         {             #pragma omp parallel for num_threads(16)             for(unsigned i=0; i<16; i++)             {                 doNothing();             }         }     }      auto endTime = std::chrono::system_clock::now();     std::chrono::duration<double> elapsed_seconds = endTime - startTime;      return elapsed_seconds.count(); }  int main() {     int cppt = run(1);     int ompt = run(2);      cout<<cppt<<endl;     cout<<ompt<<endl;      return 0; } 

回答1:

OpenMP thread-pools for its Pragmas (also here and here). Spinning up and tearing down threads is expensive. OpenMP avoids this overhead, so all it's doing is the actual work and the minimal shared-memory shuttling of the execution state. In your Threads code you are spinning up and tearing down a new set of 16 threads every iteration.



回答2:

I tried a code of an 100 looping at Choosing the right threading framework and it took OpenMP 0.0727, Intel TBB 0.6759 and C++ thread library 0.5962 mili-seconds.

I also applied what AruisDante suggested;

void nested_loop(int max_i, int band)   {     for (int i = 0; i < max_i; i++)     {         doNothing(band);     } } ... else if (algorithmToRun == 5) {     thread bristle(nested_loop, max_i, band);     bristle.join(); } 

This code looks like taking less time than your original C++ 11 thread section.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!