What is the fastest way to return the positions of all set bits in a 64-bit integer?

后端 未结 10 1900
北荒
北荒 2020-12-13 04:05

I need a fast way to get the position of all one bits in a 64-bit integer. For example, given x = 123703, I\'d like to fill an array idx[] = {0, 1, 2, 4,

10条回答
  •  粉色の甜心
    2020-12-13 04:17

    Using char wouldn't help you to increase speed but in fact often needs more ANDing and sign/zero extending while calculating. Only in the case of very large arrays that should fit in cache, smaller int types should be used

    Another thing you can improve is the COPY macro. Instead of copy byte-by-byte, copy the whole word if possible

    inline COPY(unsigned char *dst, unsigned char *src, int n)
    {
    switch(n) { // remember to align dst and src when declaring
    case 8:
        *((int64_t*)dst) = *((int64_t*)src);
        break;
    case 7:
        *((int32_t*)dst) = *((int32_t*)src);
        *((int16_t*)(dst + 4)) = *((int32_t*)(src + 4));
        dst[6] = src[6];
        break;
    case 6:
        *((int32_t*)dst) = *((int32_t*)src);
        *((int16_t*)(dst + 4)) = *((int32_t*)(src + 4));
        break;
    case 5:
        *((int32_t*)dst) = *((int32_t*)src);
        dst[4] = src[4];
        break;
    case 4:
        *((int32_t*)dst) = *((int32_t*)src);
        break;
    case 3:
        *((int16_t*)dst) = *((int16_t*)src);
        dst[2] = src[2];
        break;
    case 2:
        *((int16_t*)dst) = *((int16_t*)src);
        break;
    case 1:
        dst[0] = src[0];
        break;
    case 0:
        break;
    }
    

    Also, since tabofs[x] and n[x] is often access close to each other, try putting it close in memory to make sure they are always in cache at the same time

    typedef struct TAB_N
    {
        int16_t n, tabofs;
    } tab_n[256];
    
    src=tab0+tab_n[b0].tabofs; COPY(dst, src, tab_n[b0].n);
    src=tab0+tab_n[b1].tabofs; COPY(dst, src, tab_n[b1].n);
    src=tab0+tab_n[b2].tabofs; COPY(dst, src, tab_n[b2].n);
    src=tab0+tab_n[b3].tabofs; COPY(dst, src, tab_n[b3].n);
    src=tab0+tab_n[b4].tabofs; COPY(dst, src, tab_n[b4].n);
    src=tab0+tab_n[b5].tabofs; COPY(dst, src, tab_n[b5].n);
    

    Last but not least, gettimeofday is not for performance counting. Use QueryPerformanceCounter instead, it's much more precise

提交回复
热议问题