Help me improve some more SSE2 code

烈酒焚心 提交于 2019-12-05 22:01:08

Two suggestions:

  • run this code in a test harness under a decent profiler on Core 2 (e.g. Zoom) to see where the hotspots and dependency/other stalls are

  • re-write the SIMD code using intrinsics and then let the compiler handle register allocation, instruction scheduling and other optimisations - a decent compiler such as ICC, or even gcc will do a lot better job than your hand-coded assembly. And as a bonus you can also re-target for different x86 CPU families without having to re-write your code.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!