Fastest way to scan for bit pattern in a stream of bits

后端 未结 13 1743
悲哀的现实
悲哀的现实 2020-12-07 13:38

I need to scan for a 16 bit word in a bit stream. It is not guaranteed to be aligned on byte or word boundaries.

What is the fastest way of achieving this

相关标签:
13条回答
  • 2020-12-07 14:39

    A fast way to find the matches in big bitstrings would be to calculate a lookup table that shows the bit offsets where a given input byte matches the pattern. Then combining three consecutive offset matches together you can get a bit vector that shows which offsets match the whole pattern. For example if byte x matches first 3 bits of the pattern, byte x+1 matches bits 3..11 and byte x+2 matches bits 11..16, then there is a match at byte x + 5 bits.

    Here's some example code that does this, accumulating the results for two bytes at a time:

    void find_matches(unsigned char* sequence, int n_sequence, unsigned short pattern) {
        if (n_sequence < 2)
            return; // 0 and 1 byte bitstring can't match a short
    
        // Calculate a lookup table that shows for each byte at what bit offsets
        // the pattern could match.
        unsigned int match_offsets[256];
        for (unsigned int in_byte = 0; in_byte < 256; in_byte++) {
            match_offsets[in_byte] = 0xFF;
            for (int bit = 0; bit < 24; bit++) {
                match_offsets[in_byte] <<= 1;
                unsigned int mask = (0xFF0000 >> bit) & 0xFFFF;
                unsigned int match_location = (in_byte << 16) >> bit;
                match_offsets[in_byte] |= !((match_location ^ pattern) & mask);
            }
        }
    
        // Go through the input 2 bytes at a time, looking up where they match and
        // anding together the matches offsetted by one byte. Each bit offset then
        // shows if the input sequence is consistent with the pattern matching at
        // that position. This is anded together with the large offsets of the next
        // result to get a single match over 3 bytes.
        unsigned int curr, next;
        curr = 0;
        for (int pos = 0; pos < n_sequence-1; pos+=2) {
            next = ((match_offsets[sequence[pos]] << 8) | 0xFF) & match_offsets[sequence[pos+1]];
            unsigned short match = curr & (next >> 16);
            if (match)
                output_match(pos, match);
            curr = next;
        }
        // Handle the possible odd byte at the end
        if (n_sequence & 1) {
            next = (match_offsets[sequence[n_sequence-1]] << 8) | 0xFF;
            unsigned short match = curr & (next >> 16);
            if (match)
                output_match(n_sequence-1, match);
        }
    }
    
    void output_match(int pos, unsigned short match) {
        for (int bit = 15; bit >= 0; bit--) {
            if (match & 1) {
                printf("Bitstring match at byte %d bit %d\n", (pos-2) + bit/8, bit % 8);
            }
            match >>= 1;
        }
    }
    

    The main loop of this is 18 instructions long and processes 2 bytes per iteration. If the setup cost isn't an issue, this should be about as fast as it gets.

    0 讨论(0)
提交回复
热议问题