I\'m trying to implement the checksum computation code(2\'s complement addition) for NEON, using intrinsic. The current checksum computation is being carried out on ARM.
A few things you can improve:
disp - this looks like debug code that got left in ?gcc -O3 ... to get maximum benefit from compiler optimisationgoto ! (Doesn't affect performance but is evil.)