Profiling suggests that this function here is a real bottle neck for my application:
static inline int countEqualChars(const char* string1, const char* strin
Auto-vectorization in current gcc is a matter of helping the compiler to understand that's easy to vectorize the code. In your case: it will understand the vectorization request if you remove the conditional and rewrite the code in a more imperative way:
static inline int count(const char* string1, const char* string2, int size) {
int r = 0;
bool b;
for (int j = 0; j < size; ++j) {
b = (string1[j] == string2[j]);
r += b;
}
return r;
}
In this case:
movdqa 16(%rsp), %xmm1
movl $.LC2, %esi
pxor %xmm2, %xmm2
movzbl 416(%rsp), %edx
movdqa .LC1(%rip), %xmm3
pcmpeqb 224(%rsp), %xmm1
cmpb %dl, 208(%rsp)
movzbl 417(%rsp), %eax
movl $1, %edi
pand %xmm3, %xmm1
movdqa %xmm1, %xmm5
sete %dl
movdqa %xmm1, %xmm4
movzbl %dl, %edx
punpcklbw %xmm2, %xmm5
punpckhbw %xmm2, %xmm4
pxor %xmm1, %xmm1
movdqa %xmm5, %xmm6
movdqa %xmm5, %xmm0
movdqa %xmm4, %xmm5
punpcklwd %xmm1, %xmm6
(etc.)