C++ Templates: Convincing self against code bloat

前端 未结 7 1061
暖寄归人
暖寄归人 2020-12-09 18:08

I have heard about code bloats in context of C++ templates. I know that is not the case with modern C++ compilers. But, I want to construct an example and convince myself.

相关标签:
7条回答
  • 2020-12-09 18:48

    Templates have nothing to do with that.

    Consider this small program:

    a.h:

    class a {
        int foo() { return 42; }
    };
    

    b.cpp:

    #include "a.h"
    
    void b() {
      a my_a;
      my_a.foo();
    }
    

    c.cpp:

    #include "a.h"
    
    void c() {
      a my_a;
      my_a.foo();
    }
    

    There are no templates, but you have the exact same problem. The same function is defined in multiple translation units. And the rule is the same: only one definition is allowed to exist in the final program, otherwise the compiler wouldn't be able to determine which one to call, and otherwise two function pointers pointing to the same function may point to different addresses.

    The "problem" with template code bloat is something different: it is if you create a lot of different instantiations of the same template. For example, using your class, this program will risk some code bloat:

    Array< int, 100 > i100;
    Array< int, 99 > i99;
    Array< long, 100 > l100;
    Array< long, 99> l99;
    
    i100.Data();
    i99.Data();
    l100.Data();
    l99.Data();
    

    Strictly speaking, the compiler is required to create 4 different instantiations of the Data function, one for each set of template parameters. In practice, some (but not all) compilers try to merge them back together, as long as the generated code is identical. (In this case, the assembly generated for Array< int, 100 > and Array< long, 100 > would be identical on many platforms, and the function doesn't depend on the array size either, so the 99 and 100 variants should also produce identical code, so a clever compiler will merge the instantiations back together.

    There is no magic to templates. They don't mysteriously "bloat" your code. They just give you a tool that lets you easily create a bajillion different types from the same template. If you actually use all of these types, it has to generate code for all of them. As always with C++, you pay for what you use. If you use both an Array<long, 100>, an Array<int, 100>, an Array<unsigned long, 100> and an Array<unsigned int, 100>, then you get four different classes, because four different classes was what you asked for. If you don't ask for four different classes, they don't cost you anything.

    0 讨论(0)
  • 2020-12-09 18:53

    In this specific case, you'll find that g++ will tend to inline the accessor if you have any sort of optimization on. That right there is some minor code bloat, though it's debatable if the overhead of the call would be any less.

    However, one easy way to verify what's getting compiled is with the nm tool. If I compile your code with a simple main() to exercise ArrayX::data() and ArrayY::data() and then compile it with -O0 to turn off inlining, I can run nm -C to see the symbols in the executable:

    % nm -C test
    0804a040 B ArrayX
    0804a1e0 B ArrayY
    08049f08 d _DYNAMIC
    08049ff4 d _GLOBAL_OFFSET_TABLE_
    0804858c R _IO_stdin_used
             w _Jv_RegisterClasses
    080484c4 W Array<int, 100u>::data()
    08049ef8 d __CTOR_END__
    08049ef4 d __CTOR_LIST__
    08049f00 D __DTOR_END__
    ...
    

    You'll see that the Array<int, 100u>::data() symbol only occurs once in the final executable even though the object file for each of the two translation units contains it's own copy. (The nm tool also works on object files. You can use it to check that x.o and y.o each has a copy of Array<int, 100u>::data().)

    If nm doesn't provide enough detail, you might also have a look at the objdump tool. It's a lot like nm, but with debugging symbols turned on, it can even show you things like a disassembly of the output executable with intermixed source lines.

    0 讨论(0)
  • 2020-12-09 18:53

    Here is a little utility script I have been using to get insight into just these issues. It shows you not only if a symbol is defined multiple times, but also how much code size each symbol is taking. I have found this extremely valuable for auditing code size issues.

    For example, here is a sample invocation:

    $ ~/nmsize src/upb_table.o 
     39.5%     488 upb::TableBase::DoInsert(upb::TableBase::Entry const&)
     57.9%     228 upb::TableBase::InsertBase(upb::TableBase::Entry const&)
     70.8%     159 upb::MurmurHash2(void const*, unsigned long, unsigned int)
     78.0%      89 upb::TableBase::GetEmptyBucket() const
     83.8%      72 vtable for upb::TableBase
     89.1%      65 upb::TableBase::TableBase(unsigned int)
     94.3%      65 upb::TableBase::TableBase(unsigned int)
     95.7%      17 typeinfo name for upb::TableBase
     97.0%      16 typeinfo for upb::TableBase
     98.0%      12 upb::TableBase::~TableBase()
     98.7%       9 upb::TableBase::Swap(upb::TableBase*)
     99.4%       8 upb::TableBase::~TableBase()
    100.0%       8 upb::TableBase::~TableBase()
    100.0%       0 
    100.0%       0 __cxxabiv1::__class_type_info
    100.0%       0 
    100.0%    1236 TOTAL
    

    In this case I have run it on a single .o file, but you can also run it on a .a file or on an executable. Here I can see that constructors and destructors were emitted twice or three times, which is a result of this bug.

    Here is the script:

    #!/usr/bin/env ruby
    
    syms = []
    total = 0
    IO.popen("nm --demangle -S #{ARGV.join(' ')}").each_line { |line|
      addr, size, scope, name = line.split(' ', 4)
      next unless addr and size and scope and name
      name.chomp!
      addr = addr.to_i(16)
      size = size.to_i(16)
      total += size
      syms << [size, name]
    }
    
    syms.sort! { |a,b| b[0] <=> a[0] }
    
    cumulative = 0.0
    syms.each { |sym|
      size = sym[0]
      cumulative += size
      printf "%5.1f%%  %6s %s\n", cumulative / total * 100, size.to_s, sym[1]
    }
    
    printf "%5.1f%%  %6s %s\n", 100, total, "TOTAL"
    

    If you run this on your own .a files or executable files, you should be able to convince yourself that you know exactly what is happening with your code size. I believe that recent versions of gcc may remove redundant or useless template instantiations at link time, so I recommend analyzing your actual executables.

    0 讨论(0)
  • 2020-12-09 18:54

    You're asking the wrong question - any "bloat" in your example has nothing to do with templates. (the answer to your question, btw, is to take the address of the member function in both modules and you'll see they're the same)

    What you really want to ask is, for each template instantiation, does the resulting executable grow linearly? The answer is no, the linker/optimizer will do magic.

    Compile an exe that creates one type:

    Array< int, 100 > MyArray;
    

    Note the resulting exe size. Now do it again:

    Array< int, 100 > MyArray;
    Array< int, 99 > MyArray;
    

    And so on, for 30 or so different versions, charting the resulting exe sizes. If templates were as horrible as people think, the exe size would grow by a fixed amount for each unique template instantiation.

    0 讨论(0)
  • 2020-12-09 18:54

    One test would be to put a static variable in data(), and increment it on each call, and report it.

    If MyArray::data() is occupying the same code space, then you should see it report 1 and then 2.

    If not, you should just see 1.

    I ran it, and got 1 then 2, indicating that it was running from the same set of code. To verify this was indeed true, I created another array with size parameter of 50, and it kicked out 1.

    Full code (with a couple tweaks and fixes) is below:

    Array.hpp:

    #ifndef ARRAY_HPP
    #define ARRAY_HPP
    #include <cstdlib>
    #include <iostream>
    
    using std::size_t;
    
    template< typename T, size_t N >
    class Array {
      public:
        T * data();
      private:
        T elems_[ N ];
    };
    
    template< typename T, size_t N >
    T * Array<T,N>::data() {
        static int i = 0;
        std::cout << ++i << std::endl;
        return elems_;
    }
    
    #endif
    

    types.hpp:

    #ifndef TYPES_HPP
    #define TYPES_HPP
    
    #include "Array.hpp"
    
    typedef Array< int, 100 > MyArray;
    typedef Array< int, 50 > MyArray2;
    
    #endif
    

    x.cpp:

    #include "types.hpp"
    
    void x()
    {
        MyArray arrayX;
        arrayX.data();
    }
    

    y.cpp:

    #include "types.hpp"
    
    void y()
    {
        MyArray arrayY;
        arrayY.data();
        MyArray2 arrayY2;
        arrayY2.data();
    }
    

    main.cpp:

    void x();
    void y();
    
    int main()
    {
        x();
        y();
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-09 18:55

    A better illustration of code bloat with templates is using a template to generate code, not variables. The typical panic is due to the compiler generating code for each instance of the template (stencil). This is similar to code bloat due to inline functions and methods. However, modern compilers and linkers can perform magick to reduce code size, depending on the optimization settings.

    For example:

    template <typename Any_Type>
    void Print_Hello(const Any_Type& v)
    {
        std::cout << "Hello, your value is:\n"
                  << v
                  << "\n";
        return;
    }
    

    The above code is best thought of as a stencil. The compiler will generate the code depending on the type passed to Print_Hello. The bloat here is that very little of the code is actually dependent on the variable. (Which can be reduced, by factoring out const code & data.)

    The fear is that the compiler will generate the code for each instantiation using the same variable type, thus building up repetitive code:

    int main(void)
    {
      int a = 5;
      int b = 6;
      Print_Hello(a); // Instantiation #1
      Print_Hello(b); // Instantiation #2
      return 0;
    }
    

    The fear could also be extended when the template (stencil) is instantiated in different translation units.

    Modern compilers and linkers are smart. A smart compiler would recognize the template function call and convert into some unique mangled name. The compiler would then only use one instantiation for each call. Similar to function overloading.

    Even if the compiler was sloppy and generated multiple instances of the function (for the same type), the linker would recognize the duplicates and only put one instance into the executable.

    When used wrecklessly, a function or method template can add extra code. Examples are large functions that only differ by type in a few areas. They have a high ratio of non-typed code to type dependent code.

    An implementation of the above example with less bloat:

    void Print_Prompt(void)
    {
      std::cout << "Hello, your value is:\n";
      return;
    }
    
    template <typename Any_Type>
    void Better_Print_Hello(const Any_Type& v)
    {
      Print_Prompt();
      std::cout << v << "\n";
      return;
    }
    

    The primary difference is that the code that is not dependent on the variable type has been factored out into a new function. This may not seem worthwhile for this small example, but it illustrates the concept. And the concept is to refactor the function into pieces that are dependent on the variable type and those that are not. The pieces that are dependent are converted into templated functions.

    0 讨论(0)
提交回复
热议问题