Is it possible to call a CUDA CUBLAS function from a global or device function

懵懂的女人 提交于 2019-12-20 03:32:06

问题


I'm trying to parallelize an existing application, I have most of the application parallelized and running on the GPU, I'm having issues migrating one function to the GPU

The function uses a function dtrsv which part of the blas library,see below.

void dtrsv_call_N(double* B, double* A, int* n, int* lda, int* incx) {
  F77_CALL(dtrsv)("L","T","N", n, B, lda, A, incx);
}

I've been able to call the equivalent cuda/cublas function as per below,and the results produced are equivalent to the fortran dtrsv sub routine.

status = cublasDtrsv(handle,CUBLAS_FILL_MODE_LOWER,CUBLAS_OP_T,CUBLAS_DIAG_NON_UNIT, x, dev_m1, x, dev_m2, c);

if (status != CUBLAS_STATUS_SUCCESS) {
        printf ( "!!!! kernel execution error.\n");
        return EXIT_FAILURE;
    }

My problem is that I need to be able to call cublasDtrsv from a device or global function, like below,

__global__ void Dtrsv__cm2(cublasHandle_t handle,cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double *A, int lda, double *x, int incx){
    cublasDtrsv(handle,uplo,trans,diag, n, A, lda, x, incx);
}

In cuda 4.0 if I try to compile the below I get the below error, does anyone know if there is a means by which cublas functions can be called from a __device__ or __global__ function?

error: calling a host function("cublasDtrsv_v2") from a __device__/__global__ function("Dtrsv__dev") is not allowed


回答1:


CUDA Toolkit 5.0 introduced a device linker that can link device object files compiled separately. I believe, CUBLAS functions from CUDA Toolkit 5.0 can now be called from device functions (but I only reviewed the headers, I have no experience using CUBLAS).



来源:https://stackoverflow.com/questions/12219997/is-it-possible-to-call-a-cuda-cublas-function-from-a-global-or-device-function

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!