问题
I have the following code:
template <int size>
inline uint hashfn( const char* pStr )
{
uint result = *pStr;
switch ( size )
{
case 10:
result *= 4;
result += *pStr;
case 9:
result *= 4;
result += *pStr;
...
...
case 2:
result *= 4;
result += *pStr;
}
return result;
}
This code is a hash function for DNA sequences of certain lenghts where the length is a template parameter. It is an unrolled loop with a switch statement to jump in at the right spot. The size is however a constant since it is a template parameter. Could I specialize this for certain size values? Maybe with something like:
template <int 2>
inline uint hashfn( const char* pStr )
{
uint result = *pStr;
result *= 4;
++pStr;
result += *pStr;
return result;
}
回答1:
I would tend to do it recursively with templates.
E.g. :
template<class TOp,int factor>
struct recursive_unroll {
__forceinline static void result( TOp& k ) {
k();
recursive_unroll<TOp,factor-1>::result( k );
}
};
template<class TOp>
struct recursive_unroll<TOp,0> {
__forceinline static void result( TOp& k ) {}
};
struct op {
op( const char* s ) : res( 0 ), pStr( s ) {}
unsigned int res;
const char* pStr;
__forceinline void operator()() {
res *= 4;
res += *pStr;
++pStr;
//std::cout << res << std::endl;
}
};
char str[] = "dasjlfkhaskjfdhkljhsdaru899weiu";
int _tmain(int argc, _TCHAR* argv[])
{
op tmp( str );
recursive_unroll<op,sizeof( str ) >::result( tmp );
std::cout << tmp.res << std::endl;
return 0;
}
This produces optimal code for me. Without __forceinline the code is not properly inlined.
You should always bench your code before using such optimizations. And then you should look at the assembly and decipher what your compiler already does for you. But in this case, it seems to be an boost (for me).
__forceinline is a Microsoft Visual Studio specific extension. The compiler should generate optimal code, but for this it doesn't. So here I used this extension.
回答2:
Please read up about loop unrolling on wikipedia. The whole point is to save comparisons on the loop variable. Did you profile the code? I do not see how this can save you cycles compared to a loop.
Any modern compiler should completely unroll a loop with a small static loop count for you.
I also hope you don't use the hashes for modulo-based hash tables, as you will lose the upper bits of your hash.
回答3:
There's no loop involved in your original code, so it cannot be unrolled.
回答4:
You are allowed to specialize like that. However, I think you are focusing on the optimization of unrolling the loop. Until you profile your code and show that it is a bottleneck, you are better off not prematurely optimizing.
回答5:
I dont know if the semantics is right, but in this case the function template itself could be recursive:
template <int size>
inline uint hashfn( const char* pStr ) {
uint result = *pStr;
result *= 4;
return result + hashfn<size-1>(++pStr);
}
//case 0 stops
template <>
inline uint hashfn<0>( const char* pStr ) {
return 0;
}
As Christopher said, just be sure that it gets inlined...
回答6:
For code
template <size_t N>
void
function (float *a, float *b, float *c)
{
for (size_t i = 0; i < N; ++i)
{
c[i] = a[i] * b[i];
}
}
my compilers (msvc 2005, gcc 4.3) unroll loops automatically.
回答7:
I have the following code: ... (code that uses Duff's device) ...
It is an unrolled loop with a switch statement to jump in at the right spot. The size is however a constant since it is a template parameter. Could I specialize this for certain size values?
You certainly can, but there will be a lot of boilerplate involved, for each possible size. With appropriate levels of abstraction, and use of the Boost metaprogramming and preprocessor libraries you might even get this down to something reasonable.
Your maintenance costs will go up, so I'll echo others and recommend you make sure you really should do this now that you know you can.
来源:https://stackoverflow.com/questions/1099427/can-one-unroll-a-loop-when-working-with-an-integer-template-parameter