I have the following code doing Sin/Cos function using a pre-calculated memory table. in the following example the table has 1024*128 items covering all the Sin/Cos values f
One thing you could try would be to use the fact that cos(x) = sin(x + pi/2). And make the sine table one quarter larger, so you can reuse it as the cosine table starting one quarter in. Not sure if C# allows you to get a pointer to the middle of the table, as C would. But even if not, the decreased cache usage might be worth more than the added time for the offset into the sine table.
That, is, expressed with C:
double* _CosineDoubleTable = &_SineDoubleTable[TABLESIZE / 4];