Websocket data unmasking / multi byte xor

泪湿孤枕 提交于 2019-12-07 13:38:26

问题


websocket spec defines unmasking data as

j                   = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j

where mask is 4 bytes long and unmasking has to be applied per byte.

Is there a way to do this more efficiently, than to just loop bytes?

Server running the code can assumed to be a Haswell CPU, OS is Linux with kernel > 3.2, so SSE etc are all present. Coding is done in C, but I can do asm as well if necessary.

I'd tried to look up the solution myself, but was unable to figure out if there was an appropriate instruction in any of the dozens of SSE1-5/AVE/(whatever extension - lost track of the many over the years)

Thank you very much!

Edit: After rereading the spec a couple of times it seems that it's actually only XOR'ing the data bytes with the mask bytes, which I can do 8 bytes at a time till the last few bytes. Question is still open, as I think there could probably be still a way to optimize this using SSE or the like (maybe processing even 16 bytes at a time? letting the process do the for loop? ...)


回答1:


Yes, you can XOR 16 bytes in one instruction using SSE2, or 32 bytes at a time with AVX2 (Haswell and later).

SSE2:

#include <emmintrin.h>                     // SSE2 instrinsics

__m128i v, v_mask;
uint8_t *buff;                             // buffer - must be 16 byte aligned

for (int i = 0; i < N; i += 16)            // note that N must be multiple of 16
{
    v = _mm_load_si128(&buff[i]);          // load 16 bytes
    v = _mm_xor_si128(v, v_mask);          // XOR with mask
    v = _mm_store_si128(&buff[i], v);      // store 16 masked bytes
}

AVX2:

#include <immintrin.h>                     // AVX2 intrinsics

__m256i w, w_mask;
uint8_t *buff;                             // buffer - must be 16 byte aligned,
                                           // and preferably 32 byte aligned

for (int i = 0; i < N; i += 32)            // note that N must be multiple of 32
{
    w = _mm256_load_si256(&buff[i]);       // load 32 bytes
    w = _mm256_xor_si256(w, w_mask);       // XOR with mask
    w = _mm256_store_si256(&buff[i], w);   // store 32 masked bytes
}


来源:https://stackoverflow.com/questions/17742741/websocket-data-unmasking-multi-byte-xor

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!