Dearth of CUDA 5 Dynamic Parallelism Examples

问题

I've been googling around and have only been able to find a trivial example of the new dynamic parallelism in Compute Capability 3.0 in one of their Tech Briefs linked from here. I'm aware that the HPC-specific cards probably won't be available until this time next year (after the nat'l labs get theirs). And yes, I realize that the simple example they gave is enough to get you going, but the more the merrier.

Are there other examples I've missed?

To save you the trouble, here is the entire example given in the tech brief:

__global__ ChildKernel(void* data){
    //Operate on data
}
__global__ ParentKernel(void *data){
    ChildKernel<<<16, 1>>>(data);
}
// In Host Code
ParentKernel<<<256, 64>>(data);

// Recursion is also supported
__global__ RecursiveKernel(void* data){
    if(continueRecursion == true)
        RecursiveKernel<<<64, 16>>>(data);
}

EDIT: The GTC talk New Features In the CUDA Programming Model focused mostly on the new Dynamic Parallelism in CUDA 5. The link has the video and slides. Still only toy examples, but a lot more detail than the tech brief above.

回答1:

Here is what you need, the Dynamic parallelism programming guide. Full of details and examples: http://docs.nvidia.com/cuda/pdf/CUDA_Dynamic_Parallelism_Programming_Guide.pdf

回答2:

Just to confirm that dynamic parallelism is only supported on GPU's with a compute capability of 3.5 upwards.

I have a 3.0 GPU with cuda 5.0 installed I have compiled the Dynamic Parallelism examples nvcc -arch=sm_30 test.cu

and received the below compile error test.cu(10): error: calling a global function("child_launch") from a global function("parent_launch") is only allowed on the compute_35 architecture or above.

GPU info

Device 0: "GeForce GT 640" CUDA Driver Version / Runtime Version 5.0 / 5.0 CUDA Capability Major/Minor version number: 3.0

hope this helps

回答3:

I edited the question title to "...CUDA 5...", since Dynamic Parallelism is new in CUDA 5, not CUDA 4. We don't have any public examples available yet, because we don't have public hardware available that can run them. CUDA 5.0 will support dynamic parallelism but only on Compute Capability 3.5 and later (GK110, for example). These will be available later in the year.

We will release some examples with a CUDA 5 release candidate closer to the time the hardware is available.

回答4:

I think compute capability 3.0 doesn´t include dynamic paralelism. It will be included in the GK110 architecture (aka "Big Kepler"), I don´t know what compute capability number will have assigned (3.1? maybe). Those cards won´t be available until late this year (I´m waiting sooo much for those). As far as I know the 3.0 corresponds to the GK104 chips like the GTX690 o the GT640M for laptops.

回答5:

Just wanted to check in with you all given that the CUDA 5 RC was released recently. I looked in the SDK examples and wasn't able to find any dynamic parallelism there. Someone correct me if I'm wrong. I searched for kernel launches within kernels by grepping for "<<<" and found nothing.

来源：https://stackoverflow.com/questions/10854325/dearth-of-cuda-5-dynamic-parallelism-examples

标签

cuda

nvidia

hpc