I have a matrix A of size (m * l * 4) and size of m is around 100,000 and l=100. size of list is always equal to n and n <=m. I wanted to do matrix addition of given list of
Exchange loop by i and loop by j in the second part. This will make the function more cache-friendly.
for(int j=0;j
Also, I hope you did not forget -O3 flag.