Load 8bit uint8_t as uint32_t?

后端未结

关注

 5  966

-上瘾入骨i 2021-01-01 03:26

my image processing project works with grayscale images. I have ARM Cortex-A8 processor platform. I want to make use of the NEON.

I have a grayscale image( consider

5条回答

情歌与酒 (楼主)

2021-01-01 03:57
Load the 4 bytes using a single-lane load instruction (vld1 [], [) into a q-register, then use two move-long instructions (vmovl) to promote them first to 16 and then to 32 bit. The result should be something like (in GNU syntax)
vld1 d0[0], [] @Now d0 = (*, *, *, *, , ... ) vmovl.u8 q0, d0 @Now q1 = (d0, d1) = ((uint16_t)*, ... (uint16_t)*, , ... ) vmovl.u16 q0, d2 @Now d0 = ((uint32_t)*, ... (uint32_t)*), d1 = (, ... ) If you can guarantee that is 4-byte aligned, then write [: 32] instead in the load instruction, to save a cycle or two. If you do that and the address isn't aligned, you'll get a fault, however. Um, I just realized you want to use intrinsics, not assembly, so here's the same thing with intrinsics. uint32x4_t v8; // Will actually hold 4 uint8_t v8 = vld1_lane_u32(ptr, v8, 0); const uint16x4_t v16 = vget_low_u16(vmovl_u8(vreinterpret_u8_u32(v8))); const uint32x4_t v32 = vmovl_u16(v16);
0 讨论(0) 查看其它5个回答发布评论: 提交评论加载中...