I have a C++ snippet below with a run-time for
loop,
for(int i = 0; i < I; i++)
for (int j = 0; j < J; j++)
A( row(i,j), column(i,j)
A good compiler should do unrolling for you. For instance, in gcc compiling with the -O2 option turns on loop unrolling.
If you try to do it yourself manually, unless you measure things carefully and really know what you are doing, you are liable to end up with slower code. For example, in your case with manual unrolling you are liable to prevent the compiler from being able to do a loop interchange or stripmine optimization (look for --floop-interchange and -floop-strip-mine in the gcc docs)
This is the way to do it directly:
template <int i, int j>
struct inner
{
static void value()
{
A(row<i,j>::value, column<i,j>::value) = f<i,j>::value;
inner<i, j+1>::value();
}
};
template <int i> struct inner<i, J> { static void value() {} };
template <int i>
struct outer
{
static void value()
{
inner<i, 0>::value();
outer<i+1>::value();
}
};
template <> struct outer<I> { static void value() {} };
void test()
{
outer<0>::value();
}
You can pass A
through as a parameter to each of the value
s if necessary.
Here's a way with variadic templates that doesn't require hard coded I and J:
#include <utility>
template <int j, class Columns>
struct Inner;
template <class Columns, class Rows>
struct Outer;
template <int j, int... i>
struct Inner<j, std::index_sequence<i...>>
{
static void value() { (A(column<i, j>::value, row<i, j>::value), ...); }
};
template <int... j, class Columns>
struct Outer<std::index_sequence<j...>, Columns>
{
static void value() { (Inner<j, Columns>::value(), ...); }
};
template <int I, int J>
void expand()
{
Outer<std::make_index_sequence<I>, std::make_index_sequence<J>>::value();
}
void test()
{
expand<3, 5>();
}
(snippet with generated assembly: https://godbolt.org/g/DlgmEl)
Check out Template Metaprograms and the bubble sort implementations.
f
would need to return a double
- that can't be done at compile time.