C++ template for unrolling a loop using a switch?

为君一笑 提交于 2019-12-01 20:20:55

问题


My question is similar to Can one unroll a loop when working with an integer template parameter? but I want to mix compile time and runtime. Specifically, I know at compile time a constant NBLOCK and I want to write a switch on a variable start_block which is only known at runtime where NBLOCK is the number of entries in the switch. Here is what I got using macros:

#define CASE_UNROLL(i_loop)         \
  case i_loop : \
    dst.blocks[i_loop+1] -= (load_unaligned_epi8(srcblock) != zero) & block1; \
    srcblock += sizeof(*srcblock);

  switch(start_block)
    {
      CASE_UNROLL(0);
#if NBLOCKS > 2
      CASE_UNROLL(1);
#endif
#if NBLOCKS > 3
      CASE_UNROLL(2);
#endif
#if NBLOCKS > 4
      CASE_UNROLL(3);
#endif
...
...
#if NBLOCKS > 15
      CASE_UNROLL(14);
#endif
#if NBLOCKS > 16
#error "Too many blocks"
#endif
    }

I find it very ugly. Especially if I want to raise the bound from 16 to 32.

I would like to know if it is possible to write that using some template meta programming. The hard part is that for performance reasons it is crucial that the switch is compiled with a jump table than a sequence of nested conditional.

Note that the question is very similar to C++/C++11 - Switch statement for variadic templates? but as far as I understand the solution proposed here is to remove the switch by using a mixed compile/tun time initialized functions pointer array. I can't pay the prince a calling a function here.

I'm working with GCC if some nasty extensions is needed.


回答1:


You could simply use Boost.Preprocessor with BOOST_PP_REPEAT(COUNT, MACRO, DATA):

#define APPLY_FUNC(INDEX, FUNC) FUNC(INDEX);

// ...

switch(start_block)
{
    BOOST_PP_REPEAT(NBLOCK, APPLY_FUNC, CASE_UNROLL);
}

That should be expanded to:

switch(start_block)
{
    CASE_UNROLL(0);
    CASE_UNROLL(1);
    CASE_UNROLL(2);
    // ...
    CASE_UNROLL(NBLOCK-1);
}



回答2:


Template based unrolling:

template<int N>
struct loopUnroller
{
  template<typename Operation>
  inline void operator(Operation& op) { op(); loopUnroller<N-1>(op); }
};

template<>
struct loopUnroller<0>
{
  template<typename Operation>
  inline void operator(Operation& op) { op(); }
};

A call to loopUnroller<6>(Foo) will likely be inlined, but also contain a call to an inlined loopUnroller<5>(Foo) etc. Each level adds an extra call to Foo().

If your compiler refuses to inline 16 levels deep, there's a simple fix:

template<>
struct loopUnroller<16>
{
  template<typename Operation>
  inline void operator(Operation& op) { 
        op(); op(); op(); op();
        op(); op(); op(); op();
        op(); op(); op(); op();
        op(); op(); op(); op();
  }
};

With logarithmic complexity:

template<int N>
struct loopUnroller
{
  template<typename Operation>
  inline void operator(Operation& op) { 
       loopUnroller<N/2>(op);
       loopUnroller<N/2>(op);
       if (N%1) { op(); } // Will be optimized out if N is even.
  }
};

With dynamic complexity:

template<int L>
struct loopUnroller
{
  template<typename Operation>
  inline void operator(Operation& op, int i) {
     if (i & (1<<L)) {
       for(int j = 0; j != 1<<L; ++j)
       {
         op();
       }
     }
     loopUnroller<L-1>(op, i);
  }
};

The for loop now has a fixed runtime length, making it likely to be unrolled. So you have an unrolled loop of length 32,16,8,4,2 and 1 (assuming no specializations) and at runtime you choose the loops based on the bits of i.



来源:https://stackoverflow.com/questions/16982283/c-template-for-unrolling-a-loop-using-a-switch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!