Effective optimization strategies on modern C++ compilers

后端 未结 19 2085
梦如初夏
梦如初夏 2020-12-22 17:02

I\'m working on scientific code that is very performance-critical. An initial version of the code has been written and tested, and now, with profiler in hand, it\'s time to

19条回答
  •  醉酒成梦
    2020-12-22 17:23

    here is some stuff I had used:

    • templates to specialize innermost loops bounds (makes them really fast)
    • use __restrict__ keywords for alias problems
    • reserve vectors beforehand to sane defaults.
    • avoid using map (it can be really slow)
    • vector append/ insert can be significantly slow. If that is the case, raw operations may make it faster
    • N-byte memory alignment (Intel has pragma aligned, http://www.intel.com/software/products/compilers/docs/clin/main_cls/cref_cls/common/cppref_pragma_vector.htm)
    • trying to keep memory within L1/L2 caches.
    • compiled with NDEBUG
    • profile using oprofile, use opannotate to look for specific lines (stl overhead is clearly visible then)

    here are sample parts of profile data (so you know where to look for problems)

     * Output annotated source file with samples
     * Output all files
     *
     * CPU: Core 2, speed 1995 MHz (estimated)
    --
     * Total samples for file : "/home/andrey/gamess/source/blas.f"
     *
     * 1020586 14.0896
    --
     * Total samples for file : "/home/andrey/libqc/rysq/src/fock.cpp"
     *
     * 962558 13.2885
    --
     * Total samples for file : "/usr/include/boost/numeric/ublas/detail/matrix_assign.hpp"
     *
     * 748150 10.3285
    
    --
     * Total samples for file : "/usr/include/boost/numeric/ublas/functional.hpp"
     *
     * 639714  8.8315
    --
     * Total samples for file : "/home/andrey/gamess/source/eigen.f"
     *
     * 429129  5.9243
    --
     * Total samples for file : "/usr/include/c++/4.3/bits/stl_algobase.h"
     *
     * 411725  5.6840
    --
    

    example of code from my project

    template
    inline void eval(const Data::density_type &D, const Data::fock_type &F,
                     const double *__restrict Q, double scale) {
    
        const double * __restrict Dij = D[0];
        ...
        double * __restrict Fij = F[0];
        ...
    
        for (int l = 0, kl = 0, ijkl = 0; l < nl; ++l) {
            for (int k = 0; k < nk; ++k, ++kl) {
                for (int j = 0, ij = 0; j < nj; ++j, ++jk, ++jl) {
                    for (int i = 0; i < ni; ++i, ++ij, ++ik, ++il, ++ijkl) {
    

提交回复
热议问题