I am trying to use Unified Memory with cudaMallocManaged() with the cuBLAS library. I am performing a simple matrix to vector multiplication as a simple example, and storing