Is the ranged based for loop beneficial to performance?

问题

Reading various questions here on Stack Overflow about C++ iterators and performance**, I started wondering if for(auto& elem : container) gets "expanded" by the compiler into the best possible version? (Kind of like auto, which the compiler infers into the right type right away and is therefore never slower and sometimes faster).

** For example, does it matter if you write

for(iterator it = container.begin(), eit = container.end(); it != eit; ++it)

for(iterator it = container.begin(); it != container.end(); ++it)

for non-invalidating containers?

回答1:

The Standard is your friend, see [stmt.ranged]/1

For a range-based for statement of the form
for ( for-range-declaration : expression ) statement
let range-init be equivalent to the expression surrounded by parentheses
( expression )
and for a range-based for statement of the form
for ( for-range-declaration : braced-init-list ) statement
let range-init be equivalent to the braced-init-list. In each case, a range-based for statement is equivalent to
{
  auto && __range = range-init;
  for ( auto __begin = begin-expr,
             __end = end-expr;
        __begin != __end;
        ++__begin )
  {
    for-range-declaration = *__begin;
    statement
  }
}

So yes, the Standard guarantees that the best possible form is achieved.

And for a number of containers, such as vector, it is undefined behavior to modify (insert/erase) them during this iteration.

回答2:

Range-for is as fast as possible since it caches the end iterator^{[citation provided]}, uses pre-increment and only dereferences the iterator once.

so if you tend to write:

for(iterator i = cont.begin(); i != cont.end(); i++) { /**/ }

Then, yes, range-for may be slightly faster, since it's also easier to write there's no reason not to use it (when appropriate).

N.B. I said it's as fast as possible, it isn't however faster than possible. You can achieve the exact same performance if you write your manual loops carefully.

回答3:

Out of curiosity I decided to look at the assembly code for both approaches:

int foo1(const std::vector<int>& v) {
    int res = 0;
    for (auto x : v)
        res += x;
    return res;
}

int foo2(const std::vector<int>& v) {
    int res = 0;
    for (std::vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
      res += *it;
    return res;
}

And the assembly code (with -O3 and gcc 4.6) is exactly the same for both approaches (code for foo2 is omitted, since it is exactly the same):

080486d4 <foo1(std::vector<int, std::allocator<int> > const&)>:
80486d4:       8b 44 24 04             mov    0x4(%esp),%eax
80486d8:       8b 10                   mov    (%eax),%edx
80486da:       8b 48 04                mov    0x4(%eax),%ecx
80486dd:       b8 00 00 00 00          mov    $0x0,%eax
80486e2:       39 ca                   cmp    %ecx,%edx
80486e4:       74 09                   je     80486ef <foo1(std::vector<int, std::allocator<int> > const&)+0x1b>
80486e6:       03 02                   add    (%edx),%eax
80486e8:       83 c2 04                add    $0x4,%edx
80486eb:       39 d1                   cmp    %edx,%ecx
80486ed:       75 f7                   jne    80486e6 <foo1(std::vector<int, std::allocator<int> > const&)+0x12>
80486ef:       f3 c3                   repz ret

So, yes, both approaches are the same.

UPDATE: The same observation holds for other containers (or element types) such as vector<string> and map<string, string>. In those cases, it is especially important to use a reference in the ranged-based loop. Otherwise a temporary is created and lots of extra code appears (in the previous examples it was not needed since the vector contained just int values).

For the case of map<string, string> the C++ code snippet used is:

int foo1(const std::map<std::string, std::string>& v) {
    int res = 0;
    for (const auto& x : v) {
        res += (x.first.size() + x.second.size());
    }
    return res;
}

int foo2(const std::map<std::string, std::string>& v) {
    int res = 0;
    for (auto it = v.begin(), end = v.end(); it != end; ++it) {
        res += (it->first.size() + it->second.size());
    }
    return res;
}

And the assembly code (for both cases) is:

8048d70:       56                      push   %esi
8048d71:       53                      push   %ebx
8048d72:       31 db                   xor    %ebx,%ebx
8048d74:       83 ec 14                sub    $0x14,%esp
8048d77:       8b 74 24 20             mov    0x20(%esp),%esi
8048d7b:       8b 46 0c                mov    0xc(%esi),%eax
8048d7e:       83 c6 04                add    $0x4,%esi
8048d81:       39 f0                   cmp    %esi,%eax
8048d83:       74 1b                   je     8048da0 
8048d85:       8d 76 00                lea    0x0(%esi),%esi
8048d88:       8b 50 10                mov    0x10(%eax),%edx
8048d8b:       03 5a f4                add    -0xc(%edx),%ebx
8048d8e:       8b 50 14                mov    0x14(%eax),%edx
8048d91:       03 5a f4                add    -0xc(%edx),%ebx
8048d94:       89 04 24                mov    %eax,(%esp)
8048d97:       e8 f4 fb ff ff          call   8048990 <std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@plt>
8048d9c:       39 c6                   cmp    %eax,%esi
8048d9e:       75 e8                   jne    8048d88 
8048da0:       83 c4 14                add    $0x14,%esp
8048da3:       89 d8                   mov    %ebx,%eax
8048da5:       5b                      pop    %ebx
8048da6:       5e                      pop    %esi
8048da7:       c3                      ret

回答4:

It's possibly faster, in rare cases. Since you can't name the iterator, an optimizer can more easily prove that your loop cannot modify the iterator. This affects e.g. loop unrolling optimizations.

回答5:

No. It is same as the old for loop with iterators. After all, the range-based for works with iterators internally. The compiler just produces equivalent code for both.

来源：https://stackoverflow.com/questions/10821756/is-the-ranged-based-for-loop-beneficial-to-performance

标签

c++

performance

for-loop

foreach

c++11