Loading 8 chars from memory into an __m256 variable as packed single precision floats
问题 I am optimizing an algorithm for Gaussian blur on an image and I want to replace the usage of a float buffer[8] in the code below with an __m256 intrinsic variable. What series of instructions is best suited for this task? // unsigned char *new_image is loaded with data ... float buffer[8]; buffer[x ] = new_image[x]; buffer[x + 1] = new_image[x + 1]; buffer[x + 2] = new_image[x + 2]; buffer[x + 3] = new_image[x + 3]; buffer[x + 4] = new_image[x + 4]; buffer[x + 5] = new_image[x + 5]; buffer[x