Arm Neon Intrinsics vs hand assembly
https://web.archive.org/web/20170227190422/http://hilbert-space.de/?p=22 On this site which is quite dated it shows that hand written asm would give a much greater improvement then the intrinsics. I am wondering if this is the current truth even now in 2012. So has the compilation optimization improved for intrinsics using gnu cross compiler? My experience is that the intrinsics haven't really been worth the trouble. It's too easy for the compiler to inject extra register unload/load steps between your intrinsics. The effort to get it to stop doing that is more complicated than just writing