What is the most efficient way to transpose a matrix in CUDA?

后端 未结 3 1774
予麋鹿
予麋鹿 2021-01-06 16:27

I have a M*N host memory matrix, and upon copying into a device memory, I need it to be transposed into a N*M matrix. Is there any cuda (cuBLAS...)

3条回答
  •  一向
    一向 (楼主)
    2021-01-06 17:30

    In the cublas API:

    cublasgeam()
    
    This function performs the matrix-matrix addition/transposition
    the user can transpose matrix A by setting *alpha=1 and *beta=0.  
    

    (and specifying the transa operator as CUBLAS_OP_T for transpose)

提交回复
热议问题