dynamic-parallelism

CMake to generate a MSVC CUDA project that targets newer devices

南笙酒味 提交于 2019-12-11 18:39:23
问题 My PC has a GTX 580 (compute capability 2.0). I want to compile a CUDA source that uses dynamic parallelism, a feature introduced in compute capability 3.5. I know I will not be able to run the program on my GPU, however, it should be possible to compile this code on my machine. I'm assuming this because I can compile with no problems the CUDA samples that use 3.5 capability. These samples come with Visual Studio projects that were "manually generated" (I guess). I believe my problem is with

Dynamic parallelism - launching many small kernels is very slow

爱⌒轻易说出口 提交于 2019-12-05 04:01:44
问题 I am trying to use dynamic parallelism to improve an algorithm I have in CUDA. In my original CUDA solution, every thread computes a number that is common for each block. What I want to do is to first launch a coarse (or low resolution) kernel, where threads compute the common value just once (like if every thread represents one block). Then each thread creates a small grid of 1 block (16x16 threads), and launches a child kernel for it passing the common value. In theory it should be faster

Dynamic parallelism - launching many small kernels is very slow

╄→гoц情女王★ 提交于 2019-12-03 21:50:29
I am trying to use dynamic parallelism to improve an algorithm I have in CUDA. In my original CUDA solution, every thread computes a number that is common for each block. What I want to do is to first launch a coarse (or low resolution) kernel, where threads compute the common value just once (like if every thread represents one block). Then each thread creates a small grid of 1 block (16x16 threads), and launches a child kernel for it passing the common value. In theory it should be faster because one is saving many redundant operations. But in practice, the solution works very slow, I don't

Generating Relocatable Device Code using Nvidia Nsight

此生再无相见时 提交于 2019-12-01 12:58:24
I'm trying to compile a dynamic parallelism example on CUDA and when i try to compile it gives and error saying, kernel launch from __device__ or __global__ functions requires separate compilation modes Later found that I have to set the --relocatable-device-code flag to true . But, is there a flag to set in order to make the set relocatable-device-code to true in Nsight Eclipse? If you are not using makefile project, you could change the options passed to nvcc of a Nsight project at the following position, starting from the menu. Project - Properties - Build - Settings - Tool Settings - NVCC

Generating Relocatable Device Code using Nvidia Nsight

♀尐吖头ヾ 提交于 2019-12-01 11:39:04
问题 I'm trying to compile a dynamic parallelism example on CUDA and when i try to compile it gives and error saying, kernel launch from __device__ or __global__ functions requires separate compilation modes Later found that I have to set the --relocatable-device-code flag to true . But, is there a flag to set in order to make the set relocatable-device-code to true in Nsight Eclipse? 回答1: If you are not using makefile project, you could change the options passed to nvcc of a Nsight project at the