Websocket data unmasking / multi byte xor
websocket spec defines unmasking data as j = i MOD 4 transformed-octet-i = original-octet-i XOR masking-key-octet-j where mask is 4 bytes long and unmasking has to be applied per byte. Is there a way to do this more efficiently, than to just loop bytes? Server running the code can assumed to be a Haswell CPU, OS is Linux with kernel > 3.2, so SSE etc are all present. Coding is done in C, but I can do asm as well if necessary. I'd tried to look up the solution myself, but was unable to figure out if there was an appropriate instruction in any of the dozens of SSE1-5/AVE/(whatever extension -