Best way to approach using sincos() in CUDA

别来无恙 提交于 2020-01-06 15:20:50

问题


I am not clear on what should be the best way to implement sincos(). I've looked up everywhere but it seems the consensus is simply that it is better than doing separate computation of sin and cos. Below is essentially what I have in my kernel for using sincos. However, when I clock it against just doing sin and cos separately it comes out slower. I think it has to do with how I'm using my cPtr and sPtr. Is there a better way?

int idx = blockIdx.x * blockDim.x + threadIdx.x;

if (idx < dataSize)
{
    idx += lower;
    double f = ((double) idx) * deltaF;
    double cosValue;
    double sinValue;
    double *sPtr = &sinValue;
    double *cPtr = &cosValue;
    sincos(twopit * f, sPtr, cPtr);

    d_re[idx - lower] = cosValue;
    d_im[idx - lower] = - sinValue;

    //d_re[idx - lower] = cos(twopit * f);
    //d_im[idx - lower] = - sin(twopit * f);
}

回答1:


The pointers are redundant - you can get rid of them, e.g.

double cosValue;
double sinValue;
sincos(twopit * f, &sinValue, &cosValue);

but I'm not sure this will have much effect on performance (worth a try though).

Also consider using float rather than double where precision requirements permit, and use the corresponding single precision functions (sincosf in this case).



来源:https://stackoverflow.com/questions/11574789/best-way-to-approach-using-sincos-in-cuda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!