Fastest de-interleave operation in C?

后端 未结 6 1711
一个人的身影
一个人的身影 2021-01-02 00:30

I have a pointer to an array of bytes mixed that contains the interleaved bytes of two distinct arrays array1 and array2. Say mi

6条回答
  •  滥情空心
    2021-01-02 01:11

    I recommend Graham's solution, but if this is really speed critical and you are willing to go Assembler, you can get even faster.

    The idea is this:

    1. Read an entire 32bit integer from mixed. You'll get 'a1b2'.

    2. Rotate the lower 16bit by 8 bits to get '1ab2'(we are using little endians, since this is the default in ARM and therefore Apple A#, so the first two bytes are the lower ones).

    3. Rotate the entire 32bit register right(I think it's right...) by 8 bits to get '21ab'.

    4. Rotate the lower 16bit by 8 bits to get '12ab'

    5. Write the lower 8 bits to array2.

    6. Rotate the entire 32bit register by 16bit.

    7. Write the lower 8 bits to array1

    8. Advance array1 by 16bit, array2 by 16bit, and mixed by 32bit.

    9. Repeat.

    We have traded 2 memory reads(assuming we use the Graham's version or equivalent) and 4 memory with one memory read, two memory writes and 4 register operations. While the number of operations has gone up from 6 to 7, register operations are faster than memory operations, so it's more efficient that way. Also, since we read from mixed 32bit at a time instead of 16, we cut iteration management by half.

    PS: Theoretically this can also be done for 64bit architecture, but doing all those rotations for 'a1b2c3d4' will drive you to madness.

提交回复
热议问题