nvcc

Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

柔情痞子 提交于 2019-12-08 11:01:47
问题 In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling? At the moment I'm getting invalid device function when running kernels compiled with -arch=sm_35 on a 670 (max sm_30 ) that don't require compute 3.5. AFAIK you can't use multiple -arch=sm_*

CUDA 5.5 & Intel C/C++ Compiler on Linux

点点圈 提交于 2019-12-08 07:01:08
问题 For my current project, I need to use CUDA and the Intel C/C++ compilers in the same project. (I rely on the SSYEV implementation of Intel's MKL, which takes roughly 10 times as long when using GCC+MKL instead of ICC+MKL (~3ms from GCC, ~300µs from ICC). icc -v icc version 12.1.5 NVIDIA states, that Intel ICC 12.1 is supported (http://docs.nvidia.com/cuda/cuda-samples/index.html#linux-platforms-supported), but even after having downgraded to Intel ICC 12.1.5 (installed as part of the Intel

Why the compiled binary gets smaller when -gencode used?

我只是一个虾纸丫 提交于 2019-12-08 06:15:25
问题 Why the compiled binary gets smaller when -gencode used? My GPU's capability is 3.0. NVCC option: Without -gencode option: 1,780,520 bytes -gencode=arch=compute_30,code=sm_30 : 1,719,080 bytes, gets smaller -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_61,code=sm_61 : 1,780,800 bytes 回答1: Nvidia documentation tells that: Example: nvcc x.cu is equivalent to: nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30,compute_30 but in your case: nvcc x.cu -gencode=arch=compute_30,code

Using CImg: LNK1181: cannot open file “m.lib” on windows 7 x64

孤街浪徒 提交于 2019-12-08 05:46:04
问题 In the CImg Makefile I notice a flag "-lm" I think this points to the m.lib file. But for some reason it cannot find it during the Linking phase. I am compiling the code using the following command: nvcc -o FilledTriangles FilledTriangles.cu -I.. -O2 -lm -lgdi32 "nvcc" is just the nvidia CUDA compiler. It should function similar to g++ 回答1: -lm refers to "libm.so" In general, -lXYZ is a way of telling the linker that it should resolve the symbols in your compiled code against libXYZ.so (after

Why don't the CUDA compiler intrinsics __fadd_rd etc work for me?

喜你入骨 提交于 2019-12-08 05:06:26
问题 Why can't I use these compiler intrinsics in CUDA 5.0? In Visual Studio 2010, with CUDA toolkit 5.0 and Nsight installed I am able to compile and run most CUDA code, but __fadd_ru etc are reported as undefined. This is the code I am trying to compile. Edit: It seems that the intrinsics become undefined when either of the following includes are made in the same project: #include "cuda_runtime.h" #include "device_launch_parameters.h" 回答1: The problem is caused (somehow), by including CUDA

check if nvcc is available in makefile

青春壹個敷衍的年華 提交于 2019-12-07 19:37:30
问题 I have two versions of a function in an application, one implemented in CUDA and the other in standard C. They're in separate files, let's say cudafunc.h and func.h (the implementations are in cudafunc.cu and func.c ). I'd like to offer two options when compiling the application. If the person has nvcc installed, it'll compile the cudafunc.h . Otherwise, it'll compile func.h . Is there anyway to check if a machine has nvcc installed in the makefile and thus adjust the compiler accordingly?

How Can I use my GPU on Ipython Notebook?

旧巷老猫 提交于 2019-12-07 02:19:39
问题 OS : Ubuntu 14.04LTS Language : Python Anaconda 2.7 (keras, theano) GPU : GTX980Ti CUDA : CUDA 7.5 I wanna run keras python code on IPython Notebook by using my GPU(GTX980Ti) But I can't find it. I want to test below code. When I run it on to Ubuntu terminal, I command as below (It uses GPU well. It doesn't have any problem) First I set the path like below export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH Second I run the code as below THEANO

Why don't the CUDA compiler intrinsics __fadd_rd etc work for me?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-06 16:56:52
Why can't I use these compiler intrinsics in CUDA 5.0? In Visual Studio 2010, with CUDA toolkit 5.0 and Nsight installed I am able to compile and run most CUDA code, but __fadd_ru etc are reported as undefined. This is the code I am trying to compile. Edit: It seems that the intrinsics become undefined when either of the following includes are made in the same project: #include "cuda_runtime.h" #include "device_launch_parameters.h" The problem is caused (somehow), by including CUDA runtime headers in the project. The NVCC compiler manages the includes for the cuda runtime automatically, so you

Why the compiled binary gets smaller when -gencode used?

左心房为你撑大大i 提交于 2019-12-06 16:46:17
Why the compiled binary gets smaller when -gencode used? My GPU's capability is 3.0. NVCC option: Without -gencode option: 1,780,520 bytes -gencode=arch=compute_30,code=sm_30 : 1,719,080 bytes, gets smaller -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_61,code=sm_61 : 1,780,800 bytes Nvidia documentation tells that: Example: nvcc x.cu is equivalent to: nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30,compute_30 but in your case: nvcc x.cu -gencode=arch=compute_30,code=sm_30 is equivalent to: nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30 which does not include the

Make nvcc output traces on compile error

你离开我真会死。 提交于 2019-12-06 15:15:26
I have some trouble compiling some code with nvcc. It heavily relies on templates and the like so error messages are hard to read. For example currently I'm getting a message /usr/include/boost/utility/detail/result_of_iterate.hpp:135:338: error: invalid use of qualified-name ‘std::allocator_traits<_Alloc>::propagate_on_container_swap’ which is not really helpful. No information on where it came from or what the template arguments were. Compiling with e.g. gcc shows some really nice output with candidates and template arguments etc. Is it anyhow possible to get those with nvcc too? Or at least