I am having problems copying data from my device back to the host. My data are arranged in a struct:
typedef struct Array2D { double* arr; in
This (copying device-allocated memory using cudaMemcpy) is a known limitation in CUDA 4.1. A fix is in the works and will be released in a future version of the CUDA runtime.