Loading data for GCC's vector extensions
GCC's vector extensions offer a nice, reasonably portable way of accessing some SIMD instructions on different hardware architectures without resorting to hardware specific intrinsics (or auto-vectorization). A real use case, is calculating a simple additive checksum. The one thing that isn't clear is how to safely load data into a vector. typedef char v16qi __attribute__ ((vector_size(16))); static uint8_t checksum(uint8_t *buf, size_t size) { assert(size%16 == 0); uint8_t sum = 0; vec16qi vec = {0}; for (size_t i=0; i<(size/16); i++) { // XXX: Yuck! Is there a better way? vec += *((v16qi*)