Say I have a matrix with a dimension of A*B on GPU, where B (number of columns) is the leading dimension assuming a C style. Is there any method in
as asked within the title, to transpose a device row-major matrix A[m][n], one can do it this way:
float* clone = ...;//copy content of A to clone
float const alpha(1.0);
float const beta(0.0);
cublasHandle_t handle;
cublasCreate(&handle);
cublasSgeam( handle, CUBLAS_OP_T, CUBLAS_OP_N, m, n, &alpha, clone, n, &beta, clone, m, A, m );
cublasDestroy(handle);
And, to multiply two row-major matrices A[m][k] B[k][n], C=A*B
cublasSgemm( handle, CUBLAS_OP_N, CUBLAS_OP_N, n, m, k, &alpha, B, n, A, k, &beta, C, n );
where C is also a row-major matrix.