CULA has auxiliary routines to compute the transpose (culaDevice?geTranspose). In case of a square matrix you could also use inplace transposition (culaDevise?geTransposeInplace).
Note: CULA has a free license available, if you meet certain conditions.