Can counting byte matches between two strings be optimized using SIMD?

前端 未结 3 2094
悲哀的现实
悲哀的现实 2021-01-05 01:34

Profiling suggests that this function here is a real bottle neck for my application:

static inline int countEqualChars(const char* string1, const char* strin         


        
3条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-05 02:14

    Auto-vectorization in current gcc is a matter of helping the compiler to understand that's easy to vectorize the code. In your case: it will understand the vectorization request if you remove the conditional and rewrite the code in a more imperative way:

        static inline int count(const char* string1, const char* string2, int size) {
                int r = 0;
                bool b;
    
                for (int j = 0; j < size; ++j) {
                        b = (string1[j] == string2[j]);
                        r += b;
                }
    
                return r;
        }
    

    In this case:

    movdqa  16(%rsp), %xmm1
    movl    $.LC2, %esi
    pxor    %xmm2, %xmm2
    movzbl  416(%rsp), %edx
    movdqa  .LC1(%rip), %xmm3
    pcmpeqb 224(%rsp), %xmm1
    cmpb    %dl, 208(%rsp)
    movzbl  417(%rsp), %eax
    movl    $1, %edi
    pand    %xmm3, %xmm1
    movdqa  %xmm1, %xmm5
    sete    %dl
    movdqa  %xmm1, %xmm4
    movzbl  %dl, %edx
    punpcklbw   %xmm2, %xmm5
    punpckhbw   %xmm2, %xmm4
    pxor    %xmm1, %xmm1
    movdqa  %xmm5, %xmm6
    movdqa  %xmm5, %xmm0
    movdqa  %xmm4, %xmm5
    punpcklwd   %xmm1, %xmm6
    

    (etc.)

提交回复
热议问题