问题
I am not clear on what should be the best way to implement sincos(). I've looked up everywhere but it seems the consensus is simply that it is better than doing separate computation of sin and cos. Below is essentially what I have in my kernel for using sincos. However, when I clock it against just doing sin and cos separately it comes out slower. I think it has to do with how I'm using my cPtr and sPtr. Is there a better way?
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < dataSize)
{
idx += lower;
double f = ((double) idx) * deltaF;
double cosValue;
double sinValue;
double *sPtr = &sinValue;
double *cPtr = &cosValue;
sincos(twopit * f, sPtr, cPtr);
d_re[idx - lower] = cosValue;
d_im[idx - lower] = - sinValue;
//d_re[idx - lower] = cos(twopit * f);
//d_im[idx - lower] = - sin(twopit * f);
}
回答1:
The pointers are redundant - you can get rid of them, e.g.
double cosValue;
double sinValue;
sincos(twopit * f, &sinValue, &cosValue);
but I'm not sure this will have much effect on performance (worth a try though).
Also consider using float rather than double where precision requirements permit, and use the corresponding single precision functions (sincosf
in this case).
来源:https://stackoverflow.com/questions/11574789/best-way-to-approach-using-sincos-in-cuda