GCC ARM Performance drop

你离开我真会死。 提交于 2021-02-11 18:19:25

问题


I stumbled upon very strange issue with GCC. The issue is 25% drop in performance. Here is the story.

I have a pice of software which is fp32 compute intensive (neural networks compiled with TVM). I compile it for ARM (rk3399 device), here is info:

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/5/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-armhf/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-armhf --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-armhf --with-arch-directory=arm --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-multilib --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)

uname -a

Linux FriendlyELEC 4.4.143 #1 SMP Tue Nov 20 11:10:11 CST 2018 aarch64 aarch64 aarch64 GNU/Linux

lscpu

Architecture:          aarch64 
Byte Order:            Little Endian
CPU(s):                6
On-line CPU(s) list:   0-5
Thread(s) per core:    1
Core(s) per socket:    3
Socket(s):             2
Model name:            ARMv8 Processor rev 2 (v8l)
CPU max MHz:           1800.0000
CPU min MHz:           408.0000
Hypervisor vendor:     horizontal
Virtualization type:   full

The code was initially "slow" and cpp11, I decided to try cpp17 and cpp14. cpp17 was not supported, but cpp14 was. I switched to cpp14 and voila I got boost around 25% in performance. I really tested it to make sure the boost is in fact real and not a measuring mistake. I had this boost for a week then my device rebooted and the boost in performance was gone!

It may sound crazy, but I'm very sure in my code and measurements I had. I didn't have explicit compile flags prior to this gimmick. Now I'm trying to figure out compile flags for GCC to reclaim what was lost, but I don't have much experience with GCC. What could be the issue here? What flags can affect performance that much?

the code uses .so files, compiled with use of llvm and gcc

llvm -device=arm_cpu -target=armv8l-linux-gnueabihf -mattr=+neon,fp-armv8

回答1:


"What flags can affect performance that much?"

Turning on the optimizer -O1, -O2 or -O3 can have a dramatic effect (default is unoptimized build -O0).

Enabling link time optimization -flto can also often give significant improvements.

See also the manual for more.




回答2:


It's not GCC fault. It's CPU frequency scaling problem. I had device with ARM with Linux (ubuntu) on board, strange behavior and different benchmarking results are due to strange cpu frequency governing by OS.



来源:https://stackoverflow.com/questions/60208814/gcc-arm-performance-drop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!