C++ OpenMP Tasks - passing by reference issue

大兔子大兔子 提交于 2019-12-24 13:05:46

问题


I am currently working on a system in which I reading in a file of over ~200 million records (lines), so I am buffering the records and using OpenMP tasks to manage each batch while continuing to process input. Each record in the buffer takes roughly 60μ to process in work_on_data, and will generate a string result. To avoid critical regions, I create a vector for results, and pass record placeholders (that I insert into this vector) by address to the work_on_data function :

int i = 0;
string buffer[MAX_SIZE];
vector<string> task_results;

#pragma omp parallel shared(map_a, task_results), num_threads(X) 
#pragma omp single
{
    while (getline(fin, line) && !fin.eof())
    {
        buffer[i] = line;
        if (++i == MAX_SIZE)
        {
            string result = "";
            task_results.push_back(result);
#pragma omp task firstprivate(buffer)
            work_on_data(buffer, map_a, result);
            i = 0;
        }
    }
}

// eventually merge records in task_results

At the end of work_on_data, each result passed in will not be an empty string (as initialized). However, when merging results, each result is still an empty string. I may be doing something stupid here regarding scoping/addressing, but I don't see what the problem is. Any thoughts?

Thanks in advance.


回答1:


Pushing something into a vector causes a copy of it to be constructed inside the vector. So your work_on_data function doesn't get a reference to the string inside the vector, but to the string inside the if block. To fix this you could rewrite your code to give it access to the last element after the push_back, like so:

if (++i == MAX_SIZE)
{
    task_results.push_back("");
#pragma omp task firstprivate(buffer)
    work_on_data(buffer, map_a, task_results.back());
    i = 0;
}

Edit:

I had forgotten about iterator invalidation on vector reallocation, and additionally the call to back() leads to race conditions. With (smart) pointers (as the comments are suggesting) and a dedicated counter this works for me with no segfault:

vector<shared_ptr<string>> task_results;

int ctr = 0
...
if (++i == MAX_SIZE) {
    task_results.push_back(make_shared<string>());
#pragma omp task firstprivate(buffer, ctr) 
    work_on_data(buffer, map_a, *task_results.back[ctr]);
    i = 0;
    ++ctr;

}

I think the back() version segfaults because that function is being called by many different threads at the same time and if the main thread manages to push_back somewhere in between as well, threads would be working on the same data.



来源:https://stackoverflow.com/questions/27849876/c-openmp-tasks-passing-by-reference-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!