I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:
<
This is basically what I'm using. By making the integer a template, I ensure it's expanded compile time, so I won't end up with a slow modulo operation whatever I do.
I always like checking my input, so hence the compile time assertion. If your alignment value is wrong, well then it won't compile...
template
struct IsAligned
{
static_assert((alignment & (alignment - 1)) == 0, "Alignment must be a power of 2");
static inline bool Value(const void * ptr)
{
return (((uintptr_t)ptr) & (alignment - 1)) == 0;
}
};
To see what's going on, you can use this:
// 1 of them is aligned...
int* ptr = new int[8];
for (int i = 0; i < 8; ++i)
std::cout << IsAligned<32>::Value(ptr + i) << std::endl;
// Should give '1'
int* ptr2 = (int*)_aligned_malloc(32, 32);
std::cout << IsAligned<32>::Value(ptr2) << std::endl;