nvcc | 易学教程

CUDA compile problems on Windows, Cmake error: No CUDA toolset found

阅读更多关于 CUDA compile problems on Windows, Cmake error: No CUDA toolset found

问题 so I've been successfully working on my CUDA program on my Linux but I would like to support Windows platform as well. However, I've been struggling with correctly compiling it. I use : Windows 10 Cmake 3.15 Visual Studio 2017 CUDA Toolkit 10.1 When using the old deprecated Cmake CUDA support of using find_package(CUDA 10.1 REQUIRED) it correctly reports the correct path to the toolkit when using it. However, it is my understanding that the latest Cmake does not properly support the old

Passing CUDA function pointers with libraries

阅读更多关于 Passing CUDA function pointers with libraries

问题 I'm using CUDA and attempting to use a function pointer to pass a CUDA function to a library that later uses this function in its device kernel, similar to the CUDA function pointer example. The important sections of the code are: /** Type definition for the execution function in #qsched_run. */ typedef void (*qsched_funtype)( int , void * ); __device__ void gpuTest(int type , void *data) { .... } __device__ qsched_funtype function = gpuTest; void main(...) { //Various initialization setup.

Compilation error with nvcc and c++11, need minimal failing example

阅读更多关于 Compilation error with nvcc and c++11, need minimal failing example

问题 The following code (originally from Boost) fails to compile using nvcc 7.0 with C++11 support enabled: #include <memory> template<typename T> struct result_of_always_void { typedef void type; }; template<typename F, typename Enable = void> struct cpp0x_result_of_impl {}; template<typename F,typename T0> struct cpp0x_result_of_impl<F(T0), typename result_of_always_void< decltype(std::declval<F>()(std::declval<T0 >()))>::type > { typedef decltype(std::declval<F>()(std::declval<T0 >())) type; };

Tell NVCC to NOT preprocess host code to avoid BOOST_COMPILER redefinition

阅读更多关于 Tell NVCC to NOT preprocess host code to avoid BOOST_COMPILER redefinition

问题 I have a .cu-file that contains both host and device code: // device code __global__ void myKernel() { ... } // host code #include <boost/thread/mutex.hpp> boost::mutex myMutex; int main() { ... } As you see I do an include of boost's mutex-functionality. When I compile the file I get an error because of the following warning: warning C4005: 'BOOST_COMPILER': Macro-Redefinition c:\boost\include\boost-1_49_0\boost\config\compiler\visualc.hpp So I assume that nvcc handles all the preprocessing

Name mangling in CUDA and C++

阅读更多关于 Name mangling in CUDA and C++

问题 My C++ project main.cpp , compiled with pgcpp from PGI, calls a function cuda() containing CUDA code in a separate file cuda.cu , compiled with nvcc . Unless I wrap the cuda() function with extern "C" in the function declaration and the common header file, I get linking errors (undefined references). Without extern "C" (symbol name mismatch => undefined reference): $ nm main.o | grep -y cuda U cuda__FPfPiT2iN32 $ nm cuda.o | grep -y cuda T _Z13cudaPfPiS0_iS0_S0_S0_ With extern "C" (symbol

Howto pass flag to nvcc compiler in CMAKE

阅读更多关于 Howto pass flag to nvcc compiler in CMAKE

问题 I have a C project in Cmake in which I have embedded cuda kernel module . I want to pass --ptxas-options=-v only to nvcc in-order to view Number of registers usage per thread and shared Memory usage per block . By searching on howto pass flags to nvcc in Cmake , I came across a solution add_compile_options(myprog PRIVATE $<$<COMPILE_LANGUAGE:C>:-Wall> $<$<COMPILE_LANGUAGE:CUDA>:-arch=sm_20 -ptxas-options=-v> ) but this didn't show me the above properties. I think these flags aren't passed to

nvcc fatal: A single input file is required for a non-link phase when an outputfile is specified

阅读更多关于 nvcc fatal: A single input file is required for a non-link phase when an outputfile is specified

问题 I'm getting this problem with Nsight Eclipse. I just installed my Cuda Toolkit 5.0 I have a project which uses several C files and one Cuda file. I read that sometimes the problem arises when you use C files along Cuda files in Nsight so I changed all files to .cu and .cuh extensions in my project. Likewise it said that sometimes the problem comes from having a path for the files with black spaces which I made sure it's not this case. The error arises when it tries compiling the first file

__ldg causes slower execution time in certain situation

阅读更多关于 __ldg causes slower execution time in certain situation

问题 I posted this issue already yesterday, but wasnt well received, though I have solid repro now, please bear with me. Here are system specs: Tesla K20m with 331.67 driver, CUDA 6.0, Linux machine. Now I have a global memory read heavy application therefore I tried to optimize it using __ldg instruction on every single place where I am reading global memory. However, __ldg did not improve performance at all, running time decreased roughly 4x. So my question is, how comes that replacing glob_mem

How to specify alignment for global device variables in CUDA

阅读更多关于 How to specify alignment for global device variables in CUDA

问题 I would like to declare the alignment for a global device variable in CUDA. Specifically, I have a string declaration, like __device__ char str1 = "some pre-defined string"; In normal gcc, I can request alignment from the compiler as __device__ char str1 __attribute__ ((aligned (4))) = "some pre-defined string"; However, when I tried this on nvcc, the compiler ignores these requests. The reason I would like to do this is to copy these strings onto a buffer in my kernels, and copying words at

Specify compiler NVCC uses to compile host-code

阅读更多关于 Specify compiler NVCC uses to compile host-code

问题 When running nvcc, it always uses the Visual C++ compiler ( cl.exe ). How can I make it use the GCC compiler? Setting the CC environment-variable to gcc didn't fix it. I also couldn't find any option for this in the executeables help-output. 回答1: On Windows, NVCC only supports the Visual C++ compiler (cl.exe) for host compilation. You can of course compile .cpp (non-CUDA) code using GCC and link the objects with objects generated by nvcc. 来源： https://stackoverflow.com/questions/12117779