I have an image buffer that I need to convert to another format. The origin image buffer is four channels, 8 bits per channel, Alpha, Red, Green, and Blue
You can do it in chunks of 4 pixels, moving 32 bits with unsigned long pointers. Just think that with 4 32 bits pixels you can construct by shifting and OR/AND, 3 words representing 4 24bits pixels, like this:
//col0 col1 col2 col3
//ARGB ARGB ARGB ARGB 32bits reading (4 pixels)
//BGRB GRBG RBGR 32 bits writing (4 pixels)
Shifting operations are always done by 1 instruction cycle in all modern 32/64 bits processors (barrel shifting technique) so its the fastest way of constructing those 3 words for writing, bitwise AND and OR are also blazing fast.
Like this:
//assuming we have 4 ARGB1 ... ARGB4 pixels and 3 32 bits words, W1, W2 and W3 to write
// and *dest its an unsigned long pointer for destination
W1 = ((ARGB1 & 0x000f) << 24) | ((ARGB1 & 0x00f0) << 8) | ((ARGB1 & 0x0f00) >> 8) | (ARGB2 & 0x000f);
*dest++ = W1;
and so on.... with next pixels in a loop.
You'll need some adjusting with images that are not multiple of 4, but I bet this is the fastest approach of all, without using assembler.
And btw, forget about using structs and indexed access, those are the SLOWER ways of all for moving data, just take a look at a disassembly listing of a compiled C++ program and you'll agree with me.