Performance of pIter != cont.end() in for loop

后端未结

关注

 7  484

面向向阳花 2021-01-07 18:11

I was getting through \"Exceptional C++\" by Herb Sutter lately, and I have serious doubts about a particular recommendation he gives in Item 6 - Temporary Objects.

7条回答

刺人心 (楼主)

2021-01-07 18:36

UPD: The book you are speaking about has been published in 1999, unless I'm mistaking. That's 14 years ago, and in modern programming 14 years is a lot of time. Many recommendations that were good and reliable in 1999, may be completely obsolete by now. Though my answer is about a single compiler and a single platform, there is also a more general idea.

Caring about extra variables, reusing a return value of trivial methods and similar tricks of old C++ is a step back towards the C++ of 1990s. Trivial methods like end() should be inlined quite well, and the result of inlining should be optimized as a part of the code it is called from. 99% situations do not require manual actions such as creating an end variable at all. Such things should be done only if:

You KNOW that on some of the compilers/platforms you should run on the code is not optimized well.
It has become a bottleneck in your program ("avoid premature optimization").

I've looked at what is generated by 64-bit g++:

gcc version 4.6.3 20120918 (prerelease) (Ubuntu/Linaro 4.6.3-10ubuntu1)

Initially I thought that with optimizations on it should be ok and there should be no difference between two versions. But looks like things are strange: the version you considered non-optimal is actually better. I think, the moral is: there is no reason to try being smarter than a compiler. Let's see both versions.

#include 

using namespace std;

int main() {
  list l;
  l.push_back('a');

  for(list::iterator i=l.begin(); i != l.end(); i++)
      ;

  return 0;
}

int main1() {
  list l;
  l.push_back('a');
  list::iterator e=l.end();
  for(list::iterator i=l.begin(); i != e; i++)
      ;

  return 0;
}

Then we should compile this with optimizations on (I use 64-bit g++, you may try your compiler) and disassemble main and main1:

For main:

(gdb) disas main
Dump of assembler code for function main():
   0x0000000000400650 <+0>: push   %rbx
   0x0000000000400651 <+1>: mov    $0x18,%edi
   0x0000000000400656 <+6>: sub    $0x20,%rsp
   0x000000000040065a <+10>:    lea    0x10(%rsp),%rbx
   0x000000000040065f <+15>:    mov    %rbx,0x10(%rsp)
   0x0000000000400664 <+20>:    mov    %rbx,0x18(%rsp)
   0x0000000000400669 <+25>:    callq  0x400630 <_Znwm@plt>
   0x000000000040066e <+30>:    cmp    $0xfffffffffffffff0,%rax
   0x0000000000400672 <+34>:    je     0x400678 
   0x0000000000400674 <+36>:    movb   $0x61,0x10(%rax)
   0x0000000000400678 <+40>:    mov    %rax,%rdi
   0x000000000040067b <+43>:    mov    %rbx,%rsi
   0x000000000040067e <+46>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x0000000000400683 <+51>:    mov    0x10(%rsp),%rax
   0x0000000000400688 <+56>:    cmp    %rbx,%rax
   0x000000000040068b <+59>:    je     0x400698 
   0x000000000040068d <+61>:    nopl   (%rax)
   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690 
   0x0000000000400698 <+72>:    mov    %rbx,%rdi
   0x000000000040069b <+75>:    callq  0x400840  >::~list()>
   0x00000000004006a0 <+80>:    add    $0x20,%rsp
   0x00000000004006a4 <+84>:    xor    %eax,%eax
   0x00000000004006a6 <+86>:    pop    %rbx
   0x00000000004006a7 <+87>:    retq

Look at the commands located at 0x0000000000400683-0x000000000040068b. That's the loop body and it seems to be perfectly optimized:

   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690

For main1:

(gdb) disas main1
Dump of assembler code for function main1():
   0x00000000004007b0 <+0>: push   %rbp
   0x00000000004007b1 <+1>: mov    $0x18,%edi
   0x00000000004007b6 <+6>: push   %rbx
   0x00000000004007b7 <+7>: sub    $0x18,%rsp
   0x00000000004007bb <+11>:    mov    %rsp,%rbx
   0x00000000004007be <+14>:    mov    %rsp,(%rsp)
   0x00000000004007c2 <+18>:    mov    %rsp,0x8(%rsp)
   0x00000000004007c7 <+23>:    callq  0x400630 <_Znwm@plt>
   0x00000000004007cc <+28>:    cmp    $0xfffffffffffffff0,%rax
   0x00000000004007d0 <+32>:    je     0x4007d6 
   0x00000000004007d2 <+34>:    movb   $0x61,0x10(%rax)
   0x00000000004007d6 <+38>:    mov    %rax,%rdi
   0x00000000004007d9 <+41>:    mov    %rsp,%rsi
   0x00000000004007dc <+44>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x00000000004007e1 <+49>:    mov    (%rsp),%rdi
   0x00000000004007e5 <+53>:    cmp    %rbx,%rdi
   0x00000000004007e8 <+56>:    je     0x400818 
   0x00000000004007ea <+58>:    mov    %rdi,%rax
   0x00000000004007ed <+61>:    nopl   (%rax)
   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0 
   0x00000000004007f8 <+72>:    mov    (%rdi),%rbp
   0x00000000004007fb <+75>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400800 <+80>:    cmp    %rbx,%rbp
   0x0000000000400803 <+83>:    je     0x400818 
   0x0000000000400805 <+85>:    nopl   (%rax)
   0x0000000000400808 <+88>:    mov    %rbp,%rdi
   0x000000000040080b <+91>:    mov    (%rdi),%rbp
   0x000000000040080e <+94>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400813 <+99>:    cmp    %rbx,%rbp
   0x0000000000400816 <+102>:   jne    0x400808 
   0x0000000000400818 <+104>:   add    $0x18,%rsp
   0x000000000040081c <+108>:   xor    %eax,%eax
   0x000000000040081e <+110>:   pop    %rbx
   0x000000000040081f <+111>:   pop    %rbp
   0x0000000000400820 <+112>:   retq

The code for the loop is similar, it is:

   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0

But there is alot of extra stuff around the loop. Apparently, extra code has made the things WORSE.

0 讨论(0)

查看其它7个回答