Fastest way to scan for bit pattern in a stream of bits

后端 未结 13 1748
悲哀的现实
悲哀的现实 2020-12-07 13:38

I need to scan for a 16 bit word in a bit stream. It is not guaranteed to be aligned on byte or word boundaries.

What is the fastest way of achieving this

13条回答
  •  [愿得一人]
    2020-12-07 14:24

    I would like to suggest a solution using 3 lookup tables of size 256. This would be efficient for large bit streams. This solution takes 3 bytes in a sample for comparison. Following figure shows all possible arrangements of a 16 bit data in 3 bytes. Each byte region has shown in different color.

    alt text http://img70.imageshack.us/img70/8711/80541519.jpg

    Here checking for 1 to 8 will be taken care in first sample and 9 to 16 in next sample and so on. Now when we are searching for a Pattern, we will find all the 8 possible arrangements (as below) of this Pattern and will store in 3 lookup tables (Left, Middle and Right).

    Initializing Lookup Tables:

    Lets take an example 0111011101110111 as a Pattern to find. Now consider 4th arrangement. Left part would be XXX01110. Fill all raws of Left lookup table pointing by left part (XXX01110) with 00010000. 1 indicates starting position of arrangement of input Pattern. Thus following 8 raws of Left look up table would be filled by 16 (00010000).

    00001110
    00101110
    01001110
    01101110
    10001110
    10101110
    11001110
    11101110
    

    Middle part of arrangement would be 11101110. Raw pointing by this index (238) in Middle look up table will be filled by 16 (00010000).

    Now Right part of arrangement would be 111XXXXX. All raws (32 raws) with index 111XXXXX will be filled by 16 (00010000).

    We should not overwrite elements in look up table while filling. Instead do a bitwise OR operation to update an already filled raw. In above example, all raws written by 3rd arrangement would be updated by 7th arrangement as follows.

    Thus raws with index XX011101 in Left lookup table and 11101110 in Middle lookup table and 111XXXXX in Right lookup table will be updated to 00100010 by 7th arrangement.

    Searching Pattern:

    Take a sample of three bytes. Find Count as follows where Left is left lookup table, Middle is middle lookup table and Right is right lookup table.

    Count = Left[Byte0] & Middle[Byte1] & Right[Byte2];
    

    Number of 1 in Count gives the number of matching Pattern in taken sample.

    I can give some sample code which is tested.

    Initializing lookup table:

        for( RightShift = 0; RightShift < 8; RightShift++ )
        {
            LeftShift = 8 - RightShift;
    
            Starting = 128 >> RightShift;
    
            Byte = MSB >> RightShift;
    
            Count = 0xFF >> LeftShift;
    
            for( i = 0; i <= Count; i++ )
            {
                Index = ( i << LeftShift ) | Byte;
    
                Left[Index] |= Starting;
            }
    
            Byte = LSB << LeftShift;
    
            Count = 0xFF >> RightShift;
    
            for( i = 0; i <= Count; i++ )
            {
                Index = i | Byte;
    
                Right[Index] |= Starting;
            }
    
            Index = ( unsigned char )(( Pattern >> RightShift ) & 0xFF );
    
            Middle[Index] |= Starting;
        }
    

    Searching Pattern:

    Data is stream buffer, Left is left lookup table, Middle is middle lookup table and Right is right lookup table.

    for( int Index = 1; Index < ( StreamLength - 1); Index++ )
    {
        Count = Left[Data[Index - 1]] & Middle[Data[Index]] & Right[Data[Index + 1]];
    
        if( Count )
        {
            TotalCount += GetNumberOfOnes( Count );
        }
    }
    

    Limitation:

    Above loop cannot detect a Pattern if it is placed at the very end of stream buffer. Following code need to add after loop to overcome this limitation.

    Count = Left[Data[StreamLength - 2]] & Middle[Data[StreamLength - 1]] & 128;
    
    if( Count )
    {
        TotalCount += GetNumberOfOnes( Count );
    }
    

    Advantage:

    This algorithm takes only N-1 logical steps to find a Pattern in an array of N bytes. Only overhead is to fill the lookup tables initially which is constant in all the cases. So this will be very effective for searching huge byte streams.

提交回复
热议问题