Optimizing SIMD histogram calculation
I worked on a code that implements an histogram calculation given an opencv struct IplImage * and a buffer unsigned int * to the histogram. I'm still new to SIMD so I might not be taking advantage of the full potential the instruction set provides. histogramASM: xor rdx, rdx xor rax, rax mov eax, dword [imgPtr + imgWidthOffset] mov edx, dword [imgPtr + imgHeightOffset] mul rdx mov rdx, rax ; rdx = Image Size mov r10, qword [imgPtr + imgDataOffset] ; r10 = ImgData NextPacket: mov rax, rdx movdqu xmm0, [r10 + rax - 16] mov rcx,16 ; 16 pixels/paq PacketLoop: pextrb rbx, xmm0, 0 ; saving the pixel