I in the process of making some 2D/planar image data resampler and current highest performance approach of 2D convolution need to perform shift of long float32 vector between AV