I have a pointer to an array of bytes mixed that contains the interleaved bytes of two distinct arrays array1 and array2. Say mi
Off the top of my head, I don't know of a library function for de-interleaving 2 channel byte data. However it's worth filing a bug report with Apple to request such a function.
In the meantime, it's pretty easy to vectorize such a function using NEON or SSE intrinsics. Specifically, on ARM you will want to use vld1q_u8 to load a vector from each source array, vuzpq_u8 to de-interleave them, and vst1q_u8 to store the resulting vectors; here's a rough sketch that I haven't tested or even tried to build, but it should illustrate the general idea. More sophisticated implementations are definitely possible (in particular, NEON can load/store two 16B registers in a single instruction, which the compiler may not do with this, and some amount of pipelining and/or unrolling may be beneficial depending on how long your buffers are):
#if defined __ARM_NEON__
# include
#endif
#include
#include
void deinterleave(uint8_t *mixed, uint8_t *array1, uint8_t *array2, size_t mixedLength) {
#if defined __ARM_NEON__
size_t vectors = mixedLength / 32;
mixedLength %= 32;
while (vectors --> 0) {
const uint8x16_t src0 = vld1q_u8(mixed);
const uint8x16_t src1 = vld1q_u8(mixed + 16);
const uint8x16x2_t dst = vuzpq_u8(src0, src1);
vst1q_u8(array1, dst.val[0]);
vst1q_u8(array2, dst.val[1]);
mixed += 32;
array1 += 16;
array2 += 16;
}
#endif
for (size_t i=0; i