I am writing a piece of code designed to do some data compression on CLSID structures. I\'m storing them as a compressed stream of 128 bit integers. However, the code in questio
This is probably as optimized as you'll get. Bit-twiddling operations are some of the fastest available on the processor.
It may be faster to >> 16, >> 24 instead of >>= 8 >>= 8 - you cut down an assignment.
Also I don't think you need the & - since you're casting to a BYTE (which should be a 8-bit char) it'll get truncated down appropriately anyway. (Is it? correct me if I'm wrong)
All in all, though, these are really minor changes. Profile it to see if it actually makes a difference :P