How can I create global variables in CUDA?? Could you please give me an example?
How can create arrays inside a CUDA function for example
__global__
The C++ new
operator is supported on compute capability 2.0 and 2.1 (ie. Fermi) with CUDA 4.0, so you could use new
to allocate global memory onto a device symbol, although neither of your first two code snippets are how it would be done in practice.
On older hardware, and/or with pre CUDA 4.0 toolkits, the standard approach is to use the cudaMemcpyToSymbol API in host code:
__device__ float *a;
int main()
{
const size_t sz = 10 * sizeof(float);
float *ah;
cudaMalloc((void **)&ah, sz);
cudaMemcpyToSymbol("a", &ah, sizeof(float *), size_t(0),cudaMemcpyHostToDevice);
}
which copies a dynamically allocated device pointer onto a symbol which can be used directly in device code.
EDIT: Answering this question is a bit like hitting a moving target. For the constant memory case you now seem interested in, here is a complete working example:
#include <cstdio>
#define nn (10)
__constant__ float a[nn];
__global__ void kernel(float *out)
{
if (threadIdx.x < nn)
out[threadIdx.x] = a[threadIdx.x];
}
int main()
{
const size_t sz = size_t(nn) * sizeof(float);
const float avals[nn]={ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10. };
float ah[nn];
cudaMemcpyToSymbol("a", &avals[0], sz, size_t(0),cudaMemcpyHostToDevice);
float *ad;
cudaMalloc((void **)&ad, sz);
kernel<<<dim3(1),dim3(16)>>>(ad);
cudaMemcpy(&ah[0],ad,sz,cudaMemcpyDeviceToHost);
for(int i=0; i<nn; i++) {
printf("%d %f\n", i, ah[i]);
}
}
This shows copying data onto a constant memory symbol, and using that data inside a kernel.
On another note, the interweb is overflowing with well answered questions, tutorials, lecture notes, videos, ebooks, sample code and documentation on the basics of CUDA programming. Five minutes with the search engine of your choice would get you answers to every one of these questions you have been asking over the last few days. Perhaps it is time to do exactly that.