I have data stored as arrays of floats (single precision). I have one array for my real data, and one array for my complex data, which I use as the input to FFTs. I need t
You could do this as part of a host-> device copy. Each copy would take one of the contiguous input arrays on the host and copy it in strided fashion to the device. The storage layout of complex data types in CUDA is compatible with the layout defined for complex types in Fortran and C++, i.e. as a structure with the real part followed by imaginary part.
float * real_vec; // host vector, real part
float * imag_vec; // host vector, imaginary part
float2 * complex_vec_d; // device vector, single-precision complex
float * tmp_d = (float *) complex_vec_d;
cudaStat = cudaMemcpy2D (tmp_d, 2 * sizeof(tmp_d[0]),
real_vec, 1 * sizeof(real_vec[0]),
sizeof(real_vec[0]), n, cudaMemcpyHostToDevice);
cudaStat = cudaMemcpy2D (tmp_d + 1, 2 * sizeof(tmp_d[0]),
imag_vec, 1 * sizeof(imag_vec[0]),
sizeof(imag_vec[0]), n, cudaMemcpyHostToDevice);