I just bumped on to this question today and was trying for a solution that is better than O(N) but could not come up with one.
Searched through SO but couldn\'t find
Make it parallel. Divide the array in chunks and search in parallel. The complexity will be O(n) but running time will be much less. Actually it will be proportional to no. of processors you have.
You can use Parallel Patterns Library in C++