Load 8bit uint8_t as uint32_t?

后端 未结 5 966
-上瘾入骨i
-上瘾入骨i 2021-01-01 03:26

my image processing project works with grayscale images. I have ARM Cortex-A8 processor platform. I want to make use of the NEON.

I have a grayscale image( consider

5条回答
  •  情歌与酒
    2021-01-01 03:57

    Load the 4 bytes using a single-lane load instruction (vld1 [], [) into a q-register, then use two move-long instructions (vmovl) to promote them first to 16 and then to 32 bit. The result should be something like (in GNU syntax)

    vld1 d0[0], [
    ] @Now d0 = (*, *, *, *, , ... ) vmovl.u8 q0, d0 @Now q1 = (d0, d1) = ((uint16_t)*, ... (uint16_t)*, , ... ) vmovl.u16 q0, d2 @Now d0 = ((uint32_t)*, ... (uint32_t)*), d1 = (, ... )

    If you can guarantee that

    is 4-byte aligned, then write [
    : 32] instead in the load instruction, to save a cycle or two. If you do that and the address isn't aligned, you'll get a fault, however.

    Um, I just realized you want to use intrinsics, not assembly, so here's the same thing with intrinsics.

    uint32x4_t v8; // Will actually hold 4 uint8_t
    v8 = vld1_lane_u32(ptr, v8, 0);
    const uint16x4_t v16 = vget_low_u16(vmovl_u8(vreinterpret_u8_u32(v8)));
    const uint32x4_t v32 = vmovl_u16(v16);
    

提交回复
热议问题