C++ SSE filter implementation

别说谁变了你拦得住时间么 提交于 2019-12-02 07:42:59

If you really need to do this with floating point rather then integer/fixed point then you will need to load your 8 bit data, unpack to 32 bits (requires two operations: 8 bit to 16 bit, then 16 bit to 32 bit), then convert to float. This is horribly inefficient though, and you should look at doing this with e.g. 16 bit fixed point operations.

Note that for each 16 pixel load you will then have 4 blocks of 4 x float to process, i.e. your vectors of 16 x 8 bit pixels will become 4 x vectors of 4 x floats.

Summary of required intrinsics:

_mm_load_si128(...)       // load 16 x 8 bit values

_mm_unpacklo_epi8(...)    // unpack 8 bit -> 16 bit
_mm_unpackhi_epi8(...)

_mm_unpacklo_epi16(...)   // unpack 16 bit -> 32 bit
_mm_unpackhi_epi16(...)

_mm_cvtepi32_ps(...)      // convert 32 bit int -> float
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!