Why gcc autovectorization does not work on convolution matrix biger than 3x3?
问题 I've implemented the following program for convolution matrix #include <stdio.h> #include <time.h> #define NUM_LOOP 1000 #define N 128 //input or output dimention 1 #define M N //input or output dimention 2 #define P 5 //convolution matrix dimention 1 if you want a 3x3 convolution matrix it must be 3 #define Q P //convolution matrix dimention 2 #define Csize P*Q #define Cdiv 1 //div for filter #define Coffset 0 //offset //functions void unusual(); //unusual implementation of convolution void