I have a pointer to an array of bytes mixed that contains the interleaved bytes of two distinct arrays array1 and array2. Say mi
I recommend Graham's solution, but if this is really speed critical and you are willing to go Assembler, you can get even faster.
The idea is this:
Read an entire 32bit integer from mixed. You'll get 'a1b2'.
Rotate the lower 16bit by 8 bits to get '1ab2'(we are using little endians, since this is the default in ARM and therefore Apple A#, so the first two bytes are the lower ones).
Rotate the entire 32bit register right(I think it's right...) by 8 bits to get '21ab'.
Rotate the lower 16bit by 8 bits to get '12ab'
Write the lower 8 bits to array2.
Rotate the entire 32bit register by 16bit.
Write the lower 8 bits to array1
Advance array1 by 16bit, array2 by 16bit, and mixed by 32bit.
Repeat.
We have traded 2 memory reads(assuming we use the Graham's version or equivalent) and 4 memory with one memory read, two memory writes and 4 register operations. While the number of operations has gone up from 6 to 7, register operations are faster than memory operations, so it's more efficient that way. Also, since we read from mixed 32bit at a time instead of 16, we cut iteration management by half.
PS: Theoretically this can also be done for 64bit architecture, but doing all those rotations for 'a1b2c3d4' will drive you to madness.