Short version: I\'d like to know whether there are implementations of the standard trigonometric functions that are faster than the ones included in math.h.
For 2-3% gain, this is almost certainly not worth the risk of inaccuracy, error, assumptions no longer being true (e.g. never falling outside of [-1,-1]), etc., unless you are planning on running this on a huge number of machines (where 2-3% represents thousands or millions of dollars in electricity and amortized cost of the machine).
That said, if you have domain-specific knowledge about what you are trying to accomplish, you may be able to speed up your computations by a factor of two or more. For example, if you always need sin and cos of the same value, calculate them close to each other in the code and make sure that your compiler translates them into a FSINCOS assembly instruction (see this question). If you need only a small portion of the full range of the function, you can potentially use a set of low-order polynomials followed by an iteration of Newton's method to get full machine precision (or as much as you need). Again, this is much more powerful if you know that you only need some values--e.g. if you can use that sin(x) is close to x near zero, and you will only be needing values near zero, then you can dramatically decrease the number of terms you need.
But, again, my primary advice is: 2-3% is not worth it. Think harder about the algorithms used and other potential bottlenecks (e.g. is malloc eating too much time?) before you optimize this.