How to stop GCC from breaking my NEON intrinsics?

﹥>﹥吖頭↗ 提交于 2019-11-29 10:44:55

Broadly, the class of optimisation you are seeing here is known as "instruction scheduling". GCC uses instruction scheduling to try to build a better schedule for the instructions in each basic block of your program. Here, a "schedule" refers to any correct ordering of the instructions in a block, and a "better" schedule might be one which avoids stalls and other pipeline hazards, or one which reduces the live range of variables (resulting in better register allocation), or some other ordering goal on the instructions.

To avoid stalls due to hazards, GCC uses a model of the pipeline of the processor you are targeting (see here for details of the specification language used for these, and here for an example pipeline model). This model gives some indication to the GCC scheduling algorithms of the functional units of a processor, and the execution characteristics of instructions on those functional units. GCC can then schedule instructions to minimise structural hazards due to multiple instructions requiring the same processor resources.

Without a -mcpu or -mtune option (to the compiler), or a --with-cpu, or --with-tune option (to the configuration of the compiler), GCC for ARM or AArch64 will try to use a representative model for the architecture revision you are targeting. In this case, -march=armv7-a, causes the compiler to try to schedule instructions as if -mtune=cortex-a8 were passed on the command line.

So what you are seeing in your output is GCC's attempt at transforming your input in to a schedule it expects to execute well when running on a Cortex-A8, and to run reasonably well on processors which implement the ARMv7-A architecture.

To improve on this you can try:

  • Explicitly setting the processor you are targeting (-mcpu=cortex-a7)
  • Disabling instruction scheduling entirely (`-fno-schedule-insns -fno-schedule-insns2)

Note that disabling instruction scheduling entirely may well cause you problems elsewhere, as GCC will no longer be trying to reduce pipeline hazards across your code.

Edit With regards to your edit, performance bugs in GCC can be reported in the GCC Bugzilla (see https://gcc.gnu.org/bugs/ ) just as correctness bugs can be. Naturally with all optimisations there is some degree of heuristic involved and a compiler may not be able to beat a seasoned assembly programmer, but if the compiler is doing something especially egregious it can be worth highlighting.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!