For one of my OS X programs, I have a few optimized cases which use SSE4.1 instructions. On SSE3-only machines, the non-optimized branch is ran:
// SupportsSSE4
You can make a CPU dispatcher. You can do this in one file but you have to compile twice. First with SSE4.1 and then without and then link in the object file for SSE4.1. The first time you call your fucntion myfunc it calls the function myfunc_dispatch which determines the instruction set and sets the pointer to either myfunc_SSE41 or myfunc_SSE3. The next time you call your func myfunc it jumps right to the function for your instruction set.
//clang -c -O3 -msse4.1 foo.cpp -o foo_sse41.o
//clang -O3 -msse3 foo.cpp foo_sse41.o
typedef float MyFuncType(float*);
MyFuncType myfunc, myfunc_SSE41, myfunc_SSE3, myfunc_dispatch;
MyFuncType * myfunc_pointer = &myfunc_dispatch;
#ifdef __SSE4_1__
float myfunc_SSE41(float* a) {
//SSE41 code
}
#else
float myfunc_SSE3(float *a) {
//SSE3 code
}
float myfunc_dispatch(float *a) {
if(SupportsSSE4_1()) {
myfunc_pointer = myfunc_SSE41;
}
else {
myfunc_pointer = myfunc_SSE3;
}
myfunc_pointer(a);
}
float myfunc(float *a) {
(*myfunc_pointer)(a);
}
int main() {
//myfunc(a);
}
#endif