You can add one additional modulo operation to prevent the shifting by 32 bits, but I'm not convinced this is faster than using an if check in conjunction with branch predictors.
template inline T rotlMod(T x, unsigned int y)
{
y %= sizeof(T)*8;
return T((x<>((sizeof(T)*8-y) % (sizeof(T)*8))));
}