Efficient 4x4 matrix vector multiplication with SSE: horizontal add and dot product - what's the point?

送分小仙女□ 提交于 2019-11-26 17:43:44

Horizontal add and dot product instructions are complex: they are decomposed into multiple simpler microoperations which are executed by processor just like simple instructions. The exact decomposition of horizontal add and dot product instructions into microoperations is processor-specific, but for recent Intel processors horizontal add is decomposed into 2 SHUFFLE + 1 ADD microoperations, and dot product is decomposed into 1 MUL + 1 SHUFFLE + 2 ADD microoperations. Besides a larger number of microoperations, this instructions also stress the instruction decoder in the processor pipeline: Intel processors can decode only one such complex instruction per cycle (compared to 4 simple instructions). On AMD Bulldozer the relative cost of these complex instructions is even higher.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!