SSE intrinsics: Convert 32-bit floats to UNSIGNED 8-bit integers

前端未结

关注

 2  1415

走了就别回头了 2020-12-16 06:36

Using SSE intrinsics, I\'ve gotten a vector of four 32-bit floats clamped to the range 0-255 and rounded to nearest integer. I\'d now like to write those four out as bytes.

2条回答

星月不相逢 (楼主)

2020-12-16 07:02
There is no direct conversion from float to byte, _mm_cvtps_pi8 is a composite. _mm_cvtps_pi16 is also a composite, and in this case it's just doing some pointless stuff that you undo with the shuffle. They also return annoying __m64's.

Anyway, we can convert to dwords (signed, but that doesn't matter), and then pack (unsigned) or shuffle them into bytes. _mm_shuffle_(e)pi8 generates a pshufb, Core2 45nm and AMD processors aren't too fond of it and you have to get a mask from somewhere.

Either way you don't have to round to the nearest integer first, the convert will do that. At least, if you haven't messed with the rounding mode.

Using packs 1: (not tested) -- probably not useful, packusdw already outputs unsigned words but then packuswb wants signed words again. Kept around because it is referred to elsewhere.
```
cvtps2dq xmm0, xmm0  
packusdw xmm0, xmm0     ; unsafe: saturates to a different range than packuswb accepts
packuswb xmm0, xmm0
movd somewhere, xmm0
```
Using different shuffles:
```
cvtps2dq xmm0, xmm0  
packssdw xmm0, xmm0     ; correct: signed saturation on first step to feed packuswb
packuswb xmm0, xmm0
movd somewhere, xmm0
```
Using shuffle: (not tested)
```
cvtps2dq xmm0, xmm0
pshufb xmm0, [shufmask]
movd somewhere, xmm0

shufmask: db 0, 4, 8, 12, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...