GCC\'s vector extensions offer a nice, reasonably portable way of accessing some SIMD instructions on different hardware architectures without resorting to hardware specific
You could use an initializer to load the values, i.e. do
const vec16qi e = { buf[0], buf[1], ... , buf[15] }
and hope that GCC turns this into a SSE load instruction. I'd verify that with a dissassembler, though ;-). Also, for better performance, you try to make buf 16-byte aligned, and inform that compiler via an aligned attribute. If you can guarantee that the input buffer will be aligned, process it bytewise until you've reached a 16-byte boundard.