I\'m looking to optimize this linear search:
static int
linear (const int *arr, int n, int key)
{
int i = 0;
while (i < n) {
This answer is a little more obscure than my other one, so I'm posting it separately. It relies on the fact that C guarantees a boolean result false=0 and true=1. X86 can produce booleans without branching, so it might be faster, but I haven't tested it. Micro-optimizations like these will always be highly dependent on your processor and compiler.
As before, the caller is responsible for putting a sentinel value at the end of the array to ensure that the loop terminates.
Determining the optimum amount of loop unrolling takes some experimentation. You want to find the point of diminishing (or negative) returns. I'm going to take a SWAG and try 8 this time.
static int
linear (const int *arr, int n, int key)
{
assert(arr[n] >= key);
int i = 0;
while (arr[i] < key) {
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
i += (arr[i] < key);
}
return i;
}
Edit: As Mark points out, this function introduces a dependency in each line on the line preceding, which limits the ability of the processor pipeline to run operations in parallel. So lets try a small modification to the function to remove the dependency. Now the function does indeed require 8 sentinel elements at the end.
static int
linear (const int *arr, int n, int key)
{
assert(arr[n] >= key);
assert(arr[n+7] >= key);
int i = 0;
while (arr[i] < key) {
int j = i;
i += (arr[j] < key);
i += (arr[j+1] < key);
i += (arr[j+2] < key);
i += (arr[j+3] < key);
i += (arr[j+4] < key);
i += (arr[j+5] < key);
i += (arr[j+6] < key);
i += (arr[j+7] < key);
}
return i;
}