Performance of pIter != cont.end() in for loop

后端 未结 7 484
面向向阳花
面向向阳花 2021-01-07 18:11

I was getting through \"Exceptional C++\" by Herb Sutter lately, and I have serious doubts about a particular recommendation he gives in Item 6 - Temporary Objects.

7条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-07 18:36

    UPD: The book you are speaking about has been published in 1999, unless I'm mistaking. That's 14 years ago, and in modern programming 14 years is a lot of time. Many recommendations that were good and reliable in 1999, may be completely obsolete by now. Though my answer is about a single compiler and a single platform, there is also a more general idea.

    Caring about extra variables, reusing a return value of trivial methods and similar tricks of old C++ is a step back towards the C++ of 1990s. Trivial methods like end() should be inlined quite well, and the result of inlining should be optimized as a part of the code it is called from. 99% situations do not require manual actions such as creating an end variable at all. Such things should be done only if:

    1. You KNOW that on some of the compilers/platforms you should run on the code is not optimized well.
    2. It has become a bottleneck in your program ("avoid premature optimization").

    I've looked at what is generated by 64-bit g++:

    gcc version 4.6.3 20120918 (prerelease) (Ubuntu/Linaro 4.6.3-10ubuntu1)
    

    Initially I thought that with optimizations on it should be ok and there should be no difference between two versions. But looks like things are strange: the version you considered non-optimal is actually better. I think, the moral is: there is no reason to try being smarter than a compiler. Let's see both versions.

    #include 
    
    using namespace std;
    
    int main() {
      list l;
      l.push_back('a');
    
      for(list::iterator i=l.begin(); i != l.end(); i++)
          ;
    
      return 0;
    }
    
    int main1() {
      list l;
      l.push_back('a');
      list::iterator e=l.end();
      for(list::iterator i=l.begin(); i != e; i++)
          ;
    
      return 0;
    }
    

    Then we should compile this with optimizations on (I use 64-bit g++, you may try your compiler) and disassemble main and main1:

    For main:

    (gdb) disas main
    Dump of assembler code for function main():
       0x0000000000400650 <+0>: push   %rbx
       0x0000000000400651 <+1>: mov    $0x18,%edi
       0x0000000000400656 <+6>: sub    $0x20,%rsp
       0x000000000040065a <+10>:    lea    0x10(%rsp),%rbx
       0x000000000040065f <+15>:    mov    %rbx,0x10(%rsp)
       0x0000000000400664 <+20>:    mov    %rbx,0x18(%rsp)
       0x0000000000400669 <+25>:    callq  0x400630 <_Znwm@plt>
       0x000000000040066e <+30>:    cmp    $0xfffffffffffffff0,%rax
       0x0000000000400672 <+34>:    je     0x400678 
       0x0000000000400674 <+36>:    movb   $0x61,0x10(%rax)
       0x0000000000400678 <+40>:    mov    %rax,%rdi
       0x000000000040067b <+43>:    mov    %rbx,%rsi
       0x000000000040067e <+46>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
       0x0000000000400683 <+51>:    mov    0x10(%rsp),%rax
       0x0000000000400688 <+56>:    cmp    %rbx,%rax
       0x000000000040068b <+59>:    je     0x400698 
       0x000000000040068d <+61>:    nopl   (%rax)
       0x0000000000400690 <+64>:    mov    (%rax),%rax
       0x0000000000400693 <+67>:    cmp    %rbx,%rax
       0x0000000000400696 <+70>:    jne    0x400690 
       0x0000000000400698 <+72>:    mov    %rbx,%rdi
       0x000000000040069b <+75>:    callq  0x400840  >::~list()>
       0x00000000004006a0 <+80>:    add    $0x20,%rsp
       0x00000000004006a4 <+84>:    xor    %eax,%eax
       0x00000000004006a6 <+86>:    pop    %rbx
       0x00000000004006a7 <+87>:    retq   
    

    Look at the commands located at 0x0000000000400683-0x000000000040068b. That's the loop body and it seems to be perfectly optimized:

       0x0000000000400690 <+64>:    mov    (%rax),%rax
       0x0000000000400693 <+67>:    cmp    %rbx,%rax
       0x0000000000400696 <+70>:    jne    0x400690 
    

    For main1:

    (gdb) disas main1
    Dump of assembler code for function main1():
       0x00000000004007b0 <+0>: push   %rbp
       0x00000000004007b1 <+1>: mov    $0x18,%edi
       0x00000000004007b6 <+6>: push   %rbx
       0x00000000004007b7 <+7>: sub    $0x18,%rsp
       0x00000000004007bb <+11>:    mov    %rsp,%rbx
       0x00000000004007be <+14>:    mov    %rsp,(%rsp)
       0x00000000004007c2 <+18>:    mov    %rsp,0x8(%rsp)
       0x00000000004007c7 <+23>:    callq  0x400630 <_Znwm@plt>
       0x00000000004007cc <+28>:    cmp    $0xfffffffffffffff0,%rax
       0x00000000004007d0 <+32>:    je     0x4007d6 
       0x00000000004007d2 <+34>:    movb   $0x61,0x10(%rax)
       0x00000000004007d6 <+38>:    mov    %rax,%rdi
       0x00000000004007d9 <+41>:    mov    %rsp,%rsi
       0x00000000004007dc <+44>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
       0x00000000004007e1 <+49>:    mov    (%rsp),%rdi
       0x00000000004007e5 <+53>:    cmp    %rbx,%rdi
       0x00000000004007e8 <+56>:    je     0x400818 
       0x00000000004007ea <+58>:    mov    %rdi,%rax
       0x00000000004007ed <+61>:    nopl   (%rax)
       0x00000000004007f0 <+64>:    mov    (%rax),%rax
       0x00000000004007f3 <+67>:    cmp    %rbx,%rax
       0x00000000004007f6 <+70>:    jne    0x4007f0 
       0x00000000004007f8 <+72>:    mov    (%rdi),%rbp
       0x00000000004007fb <+75>:    callq  0x4005f0 <_ZdlPv@plt>
       0x0000000000400800 <+80>:    cmp    %rbx,%rbp
       0x0000000000400803 <+83>:    je     0x400818 
       0x0000000000400805 <+85>:    nopl   (%rax)
       0x0000000000400808 <+88>:    mov    %rbp,%rdi
       0x000000000040080b <+91>:    mov    (%rdi),%rbp
       0x000000000040080e <+94>:    callq  0x4005f0 <_ZdlPv@plt>
       0x0000000000400813 <+99>:    cmp    %rbx,%rbp
       0x0000000000400816 <+102>:   jne    0x400808 
       0x0000000000400818 <+104>:   add    $0x18,%rsp
       0x000000000040081c <+108>:   xor    %eax,%eax
       0x000000000040081e <+110>:   pop    %rbx
       0x000000000040081f <+111>:   pop    %rbp
       0x0000000000400820 <+112>:   retq   
    

    The code for the loop is similar, it is:

       0x00000000004007f0 <+64>:    mov    (%rax),%rax
       0x00000000004007f3 <+67>:    cmp    %rbx,%rax
       0x00000000004007f6 <+70>:    jne    0x4007f0 
    

    But there is alot of extra stuff around the loop. Apparently, extra code has made the things WORSE.

提交回复
热议问题