I am having problems copying data from my device back to the host. My data are arranged in a struct:
typedef struct Array2D {
double* arr;
in
I tried allocating the pointer Array2D->arr on the host using cudaMalloc instead of allocating it on the device using malloc. After that, the code works as intended.
It looks very much like the problem described in the thread (http://forums.nvidia.com/index.php?showtopic=222659) on nVidia's forum that Pavan referred to in the comments to the question.
I think that probably closes the question for now, as the code works fine. However, if anyone has a proposal for a solution which utilizes malloc on the device, feel free to post.