std::vector to array in CUDA

问题

Is there a way to convert a 2D vector into an array to be able to use it in CUDA kernels?

It is declared as:

vector<vector<int>> information;

I want to cudaMalloc and copy from host to device, what would be the best way to do it?

int *d_information;
cudaMalloc((void**)&d_information, sizeof(int)*size);
cudaMemcpy(d_information, information, sizeof(int)*size, cudaMemcpyHostToDevice);

回答1:

In a word, no there isn't. The CUDA API doesn't support deep copying and also doesn't know anything about std::vector either. If you insist on having a vector of vectors as a host source, it will require doing something like this:

int *d_information;
cudaMalloc((void**)&d_information, sizeof(int)*size);

int *dst = d_information;
for (std::vector<std::vector<int> >::iterator it = information.begin() ; it != information.end(); ++it) {
    int *src = &((*it)[0]);
    size _t sz = it->size();

    cudaMemcpy(dst, src, sizeof(int)*sz, cudaMemcpyHostToDevice);
    dst += sz;
}

[disclaimer: written in browser, not compiled or tested. Use at own risk]

This would copy the host memory to an allocation in GPU linear memory, requiring one copy for each vector. If the vector of vectors is a "jagged" array, you will want to store an indexing somewhere for the GPU to use as well.

回答2:

As far as I understand, the vector of vectors do not need to reside in a contiguous memory, i.e. they can be fragmented.

Depending on the amount of memory you need to transfer I would do one of two issues:

Reorder your memory to be a single vector, and then use your cudaMemcpy.
Create a series of cudaMemcpyAsync, where each copy handles a single vector in your vector of vectors, and then synchronize.

来源：https://stackoverflow.com/questions/17570399/stdvector-to-array-in-cuda

标签

cuda

gpgpu