Computation is optimized only if variable updated in loop is local

梦想与她 提交于 2019-12-12 11:08:36

问题


For the following function, the code with optimizations is vectorized and the computation is performed in registers (the return value is returned in eax). Generated machine code is, e.g., here: https://godbolt.org/z/VQEBV4.

int sum(int *arr, int n) {
  int ret = 0;
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

However, if I make ret variable global (or, a parameter of type int&), the vectorization is not used and the compiler stores the updated ret in each iteration to memory. Machine code: https://godbolt.org/z/NAmX4t.

int ret = 0;

int sum(int *arr, int n) {
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

I don't understand why the optimizations (vectorization/computations in registers) are prevented in the latter case. There is no threading, even the increments are not performed atomically. Moreover, this behavior seems to be consistent across compilers (GCC, Clang, Intel), so I believe there must be some reason for it.


回答1:


If ret is not local but global, arr might alias to ret reducing opportunity to optimize.



来源:https://stackoverflow.com/questions/54177320/computation-is-optimized-only-if-variable-updated-in-loop-is-local

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!