Is there a way in CUDA to allocate memory dynamically in device-side functions ? I could not find any examples of doing this.
From the CUDA C Programming manual:
According to http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf you should be able to use malloc() and free() in a device function.
Page 122
B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory.
The example given in the manual.
__global__ void mallocTest()
{
char* ptr = (char*)malloc(123);
printf(“Thread %d got pointer: %p\n”, threadIdx.x, ptr);
free(ptr);
}
void main()
{
// Set a heap size of 128 megabytes. Note that this must
// be done before any kernel is launched.
cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
mallocTest<<<1, 5>>>();
cudaThreadSynchronize();
}
You need the compiler paramter -arch=sm_20 and a card that supports >2x architecture.