nvcc | 易学教程

Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

阅读更多关于 Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

问题 In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling? At the moment I'm getting invalid device function when running kernels compiled with -arch=sm_35 on a 670 (max sm_30 ) that don't require compute 3.5. AFAIK you can't use multiple -arch=sm_*

CUDA 5.5 & Intel C/C++ Compiler on Linux

阅读更多关于 CUDA 5.5 & Intel C/C++ Compiler on Linux

问题 For my current project, I need to use CUDA and the Intel C/C++ compilers in the same project. (I rely on the SSYEV implementation of Intel's MKL, which takes roughly 10 times as long when using GCC+MKL instead of ICC+MKL (~3ms from GCC, ~300µs from ICC). icc -v icc version 12.1.5 NVIDIA states, that Intel ICC 12.1 is supported (http://docs.nvidia.com/cuda/cuda-samples/index.html#linux-platforms-supported), but even after having downgraded to Intel ICC 12.1.5 (installed as part of the Intel

Why the compiled binary gets smaller when -gencode used?

阅读更多关于 Why the compiled binary gets smaller when -gencode used?

问题 Why the compiled binary gets smaller when -gencode used? My GPU's capability is 3.0. NVCC option: Without -gencode option： 1,780,520 bytes -gencode=arch=compute_30,code=sm_30 : 1,719,080 bytes, gets smaller -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_61,code=sm_61 : 1,780,800 bytes 回答1: Nvidia documentation tells that: Example: nvcc x.cu is equivalent to: nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30,compute_30 but in your case: nvcc x.cu -gencode=arch=compute_30,code

Using CImg: LNK1181: cannot open file “m.lib” on windows 7 x64

阅读更多关于 Using CImg: LNK1181: cannot open file “m.lib” on windows 7 x64

问题 In the CImg Makefile I notice a flag "-lm" I think this points to the m.lib file. But for some reason it cannot find it during the Linking phase. I am compiling the code using the following command: nvcc -o FilledTriangles FilledTriangles.cu -I.. -O2 -lm -lgdi32 "nvcc" is just the nvidia CUDA compiler. It should function similar to g++ 回答1: -lm refers to "libm.so" In general, -lXYZ is a way of telling the linker that it should resolve the symbols in your compiled code against libXYZ.so (after

Why don't the CUDA compiler intrinsics __fadd_rd etc work for me?

阅读更多关于 Why don't the CUDA compiler intrinsics __fadd_rd etc work for me?

问题 Why can't I use these compiler intrinsics in CUDA 5.0? In Visual Studio 2010, with CUDA toolkit 5.0 and Nsight installed I am able to compile and run most CUDA code, but __fadd_ru etc are reported as undefined. This is the code I am trying to compile. Edit: It seems that the intrinsics become undefined when either of the following includes are made in the same project: #include "cuda_runtime.h" #include "device_launch_parameters.h" 回答1: The problem is caused (somehow), by including CUDA

check if nvcc is available in makefile

阅读更多关于 check if nvcc is available in makefile

问题 I have two versions of a function in an application, one implemented in CUDA and the other in standard C. They're in separate files, let's say cudafunc.h and func.h (the implementations are in cudafunc.cu and func.c ). I'd like to offer two options when compiling the application. If the person has nvcc installed, it'll compile the cudafunc.h . Otherwise, it'll compile func.h . Is there anyway to check if a machine has nvcc installed in the makefile and thus adjust the compiler accordingly?

How Can I use my GPU on Ipython Notebook?

阅读更多关于 How Can I use my GPU on Ipython Notebook?

问题 OS : Ubuntu 14.04LTS Language : Python Anaconda 2.7 (keras, theano) GPU : GTX980Ti CUDA : CUDA 7.5 I wanna run keras python code on IPython Notebook by using my GPU(GTX980Ti) But I can't find it. I want to test below code. When I run it on to Ubuntu terminal, I command as below (It uses GPU well. It doesn't have any problem) First I set the path like below export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH Second I run the code as below THEANO

Why don't the CUDA compiler intrinsics __fadd_rd etc work for me?

阅读更多关于 Why don't the CUDA compiler intrinsics __fadd_rd etc work for me?

Why can't I use these compiler intrinsics in CUDA 5.0? In Visual Studio 2010, with CUDA toolkit 5.0 and Nsight installed I am able to compile and run most CUDA code, but __fadd_ru etc are reported as undefined. This is the code I am trying to compile. Edit: It seems that the intrinsics become undefined when either of the following includes are made in the same project: #include "cuda_runtime.h" #include "device_launch_parameters.h" The problem is caused (somehow), by including CUDA runtime headers in the project. The NVCC compiler manages the includes for the cuda runtime automatically, so you

Why the compiled binary gets smaller when -gencode used?

阅读更多关于 Why the compiled binary gets smaller when -gencode used?

Why the compiled binary gets smaller when -gencode used? My GPU's capability is 3.0. NVCC option: Without -gencode option： 1,780,520 bytes -gencode=arch=compute_30,code=sm_30 : 1,719,080 bytes, gets smaller -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_61,code=sm_61 : 1,780,800 bytes Nvidia documentation tells that: Example: nvcc x.cu is equivalent to: nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30,compute_30 but in your case: nvcc x.cu -gencode=arch=compute_30,code=sm_30 is equivalent to: nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30 which does not include the

Make nvcc output traces on compile error

阅读更多关于 Make nvcc output traces on compile error

I have some trouble compiling some code with nvcc. It heavily relies on templates and the like so error messages are hard to read. For example currently I'm getting a message /usr/include/boost/utility/detail/result_of_iterate.hpp:135:338: error: invalid use of qualified-name ‘std::allocator_traits<_Alloc>::propagate_on_container_swap’ which is not really helpful. No information on where it came from or what the template arguments were. Compiling with e.g. gcc shows some really nice output with candidates and template arguments etc. Is it anyhow possible to get those with nvcc too? Or at least