Passing too many arguments by reference could be inefficient?

荒凉一梦 提交于 2020-01-05 08:45:13

问题


Disclamer: I'm using Intel Compiler 2017 and if you want to know why I'm doing this, go at the end of the question.

I have this code:

class A{
  vector<float> v;
  ...
  void foo();
  void bar();
}

void A::foo(){
  for(int i=0; i<bigNumber;i++){
    //something very expensive
    //call bar() many times per cycle;
  }
}

void A::bar(){
  //...
  v.push_back(/*something*/);
}

Now, let's suppose I want to parallelize foo() since it's very expensive. However, I can't simply use #pragma omp parallel for because of v.push_back().

To my knowledge, there are two alternatives here:

  1. We use #pragma omp critical
  2. We create a local version of v for each thread and then we joint them at the end of the parallel section, more or less as explained here.

Solution 1. is often considered a bad solution because race-condition creates a consistent overhead.

However, solution 2. requires to modify bar() in this way:

class A{
  vector<float> v;
  ...
  void foo();
  void bar(std::vector<float> &local_v);
}

void A::foo(){
  #pragma omp parallel
  {
    std::vector<float> local_v;
    #pragma omp for
    for(int i=0; i<bigNumber;i++){
      //something very expensive
      //call bar(local_v) many times per cycle;
    }
    #pragma omp critical
    {
      v.insert(v.end(), local_v.begin(), local_v.end());
    }
  }
}

void A::bar(std::vector<float> &local_v){
  //...
  v.push_back(/*something*/);
}

So far so good. Now, let's suppose that there is not only v, but there are 10 vectors, say v1, v2, ..., v10, or anyway 10 shared variables. And in addition, let's suppose that that bar isn't called directly inside foo() but is called after many nested calls. Something like foo() which calls foo1(std::vector<float> v1, ..., std::vector<float> v10) which calls foo2(std::vector<float> v1, ..., std::vector<float> v10), repeating this nested calling many other times until finally the last one calls bar(std::vector<float> v1, ..., std::vector<float> v10).

So, this looks like a nightmare for maintainability (I have to modify all the headers and callings for all the nested functions)...But even more important: we agree that passing by reference is efficient, but it's always a pointer copy. As you can see, here a lot of pointers are copied many times. Is it possible that all these copies result as inefficiency?

Actually what I care most here is performance, so if you tell me "nah, it's fine because compilers are super intelligent and they do some sorcery so you can copy one trillion of references and there is no drop in performance" then it will be fine, but I don't know if such a sorcery exists or not.

Why I'm doing this: I'm trying to parallelize this code. In particular, I'm rewriting the while here as a for which can be parallelized, but if you follow the code you'll find out that the call-back onAffineShapeFound from here is called, which modify the state of the shared object keys. This happens for many others variable, but this is the "deepest" case for this code.


回答1:


In a direct comparison between a::Bar() and a::Bar(std::vector<float> & v), the difference is that the second version will have to increase the size of the stack by an additional 8 bytes over what the original version has to do. In terms of performance, this is a pretty minimal effect: the stack pointer has to be adjusted no matter whether the function contains arguments or not (so the only real difference is a single pointer copy, which might even be optimized away depending on the compiler), and in terms of the actual performance of the function itself, constantly adding elements to a std::vector is going to be a far more expensive operation, especially if the vector ever needs to be reallocated (which will probably happen frequently, depending on how big the vector needs to get), which means that those costs will far exceed the costs of the pointer copy.

So, short version: Go nuts with the references.



来源:https://stackoverflow.com/questions/43374107/passing-too-many-arguments-by-reference-could-be-inefficient

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!