I need to scan for a 16 bit word in a bit stream. It is not guaranteed to be aligned on byte or word boundaries.
What is the fastest way of achieving this
I would like to suggest a solution using 3 lookup tables of size 256. This would be efficient for large bit streams. This solution takes 3 bytes in a sample for comparison. Following figure shows all possible arrangements of a 16 bit data in 3 bytes. Each byte region has shown in different color.
alt text http://img70.imageshack.us/img70/8711/80541519.jpg
Here checking for 1 to 8 will be taken care in first sample and 9 to 16 in next sample and so on. Now when we are searching for a Pattern, we will find all the 8 possible arrangements (as below) of this Pattern and will store in 3 lookup tables (Left, Middle and Right).
Initializing Lookup Tables:
Lets take an example 0111011101110111
as a Pattern to find. Now consider 4th arrangement. Left part would be XXX01110
. Fill all raws of Left lookup table pointing by left part (XXX01110
) with 00010000
. 1 indicates starting position of arrangement of input Pattern. Thus following 8 raws of Left look up table would be filled by 16 (00010000
).
00001110
00101110
01001110
01101110
10001110
10101110
11001110
11101110
Middle part of arrangement would be 11101110
. Raw pointing by this index (238) in Middle look up table will be filled by 16 (00010000
).
Now Right part of arrangement would be 111XXXXX
. All raws (32 raws) with index 111XXXXX
will be filled by 16 (00010000
).
We should not overwrite elements in look up table while filling. Instead do a bitwise OR operation to update an already filled raw. In above example, all raws written by 3rd arrangement would be updated by 7th arrangement as follows.
Thus raws with index XX011101
in Left lookup table and 11101110
in Middle lookup table and 111XXXXX
in Right lookup table will be updated to 00100010
by 7th arrangement.
Searching Pattern:
Take a sample of three bytes. Find Count as follows where Left is left lookup table, Middle is middle lookup table and Right is right lookup table.
Count = Left[Byte0] & Middle[Byte1] & Right[Byte2];
Number of 1 in Count gives the number of matching Pattern in taken sample.
I can give some sample code which is tested.
Initializing lookup table:
for( RightShift = 0; RightShift < 8; RightShift++ )
{
LeftShift = 8 - RightShift;
Starting = 128 >> RightShift;
Byte = MSB >> RightShift;
Count = 0xFF >> LeftShift;
for( i = 0; i <= Count; i++ )
{
Index = ( i << LeftShift ) | Byte;
Left[Index] |= Starting;
}
Byte = LSB << LeftShift;
Count = 0xFF >> RightShift;
for( i = 0; i <= Count; i++ )
{
Index = i | Byte;
Right[Index] |= Starting;
}
Index = ( unsigned char )(( Pattern >> RightShift ) & 0xFF );
Middle[Index] |= Starting;
}
Searching Pattern:
Data is stream buffer, Left is left lookup table, Middle is middle lookup table and Right is right lookup table.
for( int Index = 1; Index < ( StreamLength - 1); Index++ )
{
Count = Left[Data[Index - 1]] & Middle[Data[Index]] & Right[Data[Index + 1]];
if( Count )
{
TotalCount += GetNumberOfOnes( Count );
}
}
Limitation:
Above loop cannot detect a Pattern if it is placed at the very end of stream buffer. Following code need to add after loop to overcome this limitation.
Count = Left[Data[StreamLength - 2]] & Middle[Data[StreamLength - 1]] & 128;
if( Count )
{
TotalCount += GetNumberOfOnes( Count );
}
Advantage:
This algorithm takes only N-1 logical steps to find a Pattern in an array of N bytes. Only overhead is to fill the lookup tables initially which is constant in all the cases. So this will be very effective for searching huge byte streams.