I\'m looking to optimize this linear search:
static int
linear (const int *arr, int n, int key)
{
int i = 0;
while (i < n) {
uint32 LinearFindSse4( uint8* data, size_t data_len, uint8* finddata, size_t finddatalen )
{
/**
* the following is based on...
* #define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)
* we split it into 2 sections
* first section is:
* (v) - 0x01010101UL)
*
* second section is:
* ~(v) & 0x80808080UL)
*/
__m128i ones = _mm_set1_epi8( 0x01 );
__m128i eights = _mm_set1_epi8( 0x80 );
__m128i find_field = _mm_set1_epi8( finddata[0] );
uint32 found_at = 0;
for (int i = 0; i < data_len; i+=16)
{
#define CHECKTHIS( n ) if (!memcmp(&data[i+n], &finddata[0], sizeof(finddata))) { found_at = i + n; break; }
__m128i chunk = _mm_stream_load_si128( (__m128i *)&data[i] );
__m128i xor_result = _mm_xor_si128( chunk, find_field );
__m128i first_sec = _mm_sub_epi64( xor_result, ones );
__m128i second_sec = _mm_andnot_si128( xor_result, eights );
if(!_mm_testz_si128(first_sec, second_sec))
{
CHECKTHIS(0);
CHECKTHIS(1);
CHECKTHIS(2);
CHECKTHIS(3);
CHECKTHIS(4);
CHECKTHIS(5);
CHECKTHIS(6);
CHECKTHIS(7);
CHECKTHIS(8);
CHECKTHIS(9);
CHECKTHIS(10);
CHECKTHIS(11);
CHECKTHIS(12);
CHECKTHIS(13);
CHECKTHIS(14);
CHECKTHIS(15);
}
}
return found_at;
}
If you had a quantum computer, you could use Grover's algorithm to search your data in O(N1/2) time and using O(log N) storage space. Otherwise, your question is pretty silly. Binary search or one of its variants (trinary search, for example) is really your best choice. Doing micro-optimizations on a linear search is stupid when you can pick a superior algorithm.
So far you received multiple advice most of which state that linear search makes no sense on sorted data, when binary search will work much more efficiently instead. This often happens to be one of those popular "sounds right" assertions made by people who don't care to give the problem too much thought. In reality, if you consider the bigger picture, given the right circumstances, linear search can be much more efficient than binary search.
Note, that if we consider a single search query for a sorted array, binary search is significantly more efficient method than linear search. There's no argument about that. Also, when you perform multiple completely random queries to the same data binary search still wins over linear search.
However, the picture begins to change if we consider sequential search queries and these queries are not exactly random. Imagine that queries arrive in sorted order, i.e. each next query is for a higher value than the previous query. I.e. the queries are also sorted. BTW, they don't have to be globally and strictly sorted, from time to time the query sequence might get "reset", i.e. a low value is queried, but on average the consequent queries should arrive in increasing order. In other words, the queries arrive in series, each series sorted in ascending order. In this case, if the average length of the series is comparable to the length of your array, linear search will outperform binary search by a huge margin. However, to take advantage of this situation, you have to implement your search in incremental manner. It is simple: if the next query is greater than the previous one, you don't need to start the search from the beginning of the array. Instead, you can search from the point where the previous search stopped. The most simplistic implementation (just to illustrate the idea) might look as follows
static int linear(const int *arr, int n, int key)
{
static int previous_key = INT_MIN;
static int previous_i = 0;
i = key >= previous_key ? previous_i : 0;
while (i < n) {
if (arr[i] >= key)
break;
++i;
}
previous_key = key;
previous_i = i;
return i;
}
(Disclaimer: the above implementation is terribly ugly for the obvious reason that the array is arriving from outside as a parameter, while the previous search state is stored internally. Of course, this is wrong way to do it in practice. But again, the above is intended to illustrate the idea and no more).
Note, that the complexity of processing each series of ordered queries using the above approach is always O(N)
, regardless of the length of the series. Using the binary search, the complexity would be O(M * log N)
. So, for obvious reasons when M
is close to N
, i.e. queries arrive in sufficiently long ordered series, the above linear search will significantly outperform binary search, while for small M
the binary search will win.
Also, even if the ordered series of queries are not very long, the above modification might still give you a noticeable improvement in search performance, considering that you have to use linear search.
P.S. As an additional piece of information about the structure of the problem:
When you need to perform the search in an ordered array of length N
and you know in advance that the queries will arrive in ordered series of [approximate, average] length M
, the optimal algorithm will look as follows
S = [N/M]
. It might also make sense to "snap" the value of S
to the [nearest] power of 2. Think of your sorted array as a sequence of blocks of length S
- so called S-blocks.S
(of course, remember to start from the block where the previous search left off).The above is the most optimal incremental search algorithm possible, in a sense that it achieves the theoretical limit on the asymptotic efficiency of repetitive search. Note, that if the value of M
is much smaller then N
, the algorithm "automatically" shifts itself towards binary search, while when M
gets close to N
the algorithm "automatically" favors linear search. The latter makes sense because in such environment linear search is significantly more efficient than binary search.
This all is just to illustrate the fact that blanket statements like "linear search on a sorted array is always useless" indicate nothing else than lack of knowledge on the part of those who make such statements.
this might force vector instructions (suggested by Gman):
for (int i = 0; i < N; i += 4) {
bool found = false;
found |= (array[i+0] >= key);
...
found |= ( array[i+3] >= key);
// slight variation would be to use max intrinsic
if (found) return i;
}
...
// quick search among four elements
this also makes fewer branch instructions. you make help by ensuring input array is aligned to 16 byte boundary
another thing that may help vectorization (doing vertical max comparison):
for (int i = 0; i < N; i += 8) {
bool found = false;
found |= max(array[i+0], array[i+4]) >= key;
...
found |= max(array[i+3], array[i+7] >= key;
if (found) return i;
}
// have to search eight elements
Well, it makes about as much sense as a linear search through a sorted array!
(More seriously, can you give us some clues about why no binary search?)
This answer is a little more obscure than my other one, so I'm posting it separately. It relies on the fact that C guarantees a boolean result false=0 and true=1. X86 can produce booleans without branching, so it might be faster, but I haven't tested it. Micro-optimizations like these will always be highly dependent on your processor and compiler.
As before, the caller is responsible for putting a sentinel value at the end of the array to ensure that the loop terminates.
Determining the optimum amount of loop unrolling takes some experimentation. You want to find the point of diminishing (or negative) returns. I'm going to take a SWAG and try 8 this time.
static int
linear (const int *arr, int n, int key)
{
assert(arr[n] >= key);
int i = 0;
while (arr[i] < key) {
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
}
return i;
}
Edit: As Mark points out, this function introduces a dependency in each line on the line preceding, which limits the ability of the processor pipeline to run operations in parallel. So lets try a small modification to the function to remove the dependency. Now the function does indeed require 8 sentinel elements at the end.
static int
linear (const int *arr, int n, int key)
{
assert(arr[n] >= key);
assert(arr[n+7] >= key);
int i = 0;
while (arr[i] < key) {
int j = i;
i += (arr[j] < key);
i += (arr[j+1] < key);
i += (arr[j+2] < key);
i += (arr[j+3] < key);
i += (arr[j+4] < key);
i += (arr[j+5] < key);
i += (arr[j+6] < key);
i += (arr[j+7] < key);
}
return i;
}