nvcc

NVCC CUDA cross compiling cannot find “-lcudart”

人走茶凉 提交于 2019-12-06 13:44:36
问题 I have installed CUDA 5.0 and NVCC on my Ubuntu virtual machine and have had problems compiling even a basic CUDA C program. The error is as follows: user@ubuntu:~/CUDA$ nvcc helloworld.cu -o helloworld.o -target-cpu-arch=ARM -ccbin=/usr/bin/arm-linux-gnueabi-gcc-4.6 --machine=32 /usr/lib/gcc/arm-linux-gnueabi/4.6/../../../../arm-linux-gnueabi/bin/ld: skipping incompatible /usr/local/cuda-5.0/bin/../lib/libcudart.so when searching for -lcudart /usr/lib/gcc/arm-linux-gnueabi/4.6/../../../..

nvlink, relocatable device code and static device libraries

Deadly 提交于 2019-12-06 09:33:40
While investigating some issues with relocatable device code, I stumbled upon something I don't quite understand. This is a use case for what is pictured on slide 6 . I used an answer of Robert Crovella as a basis for a repro code. The idea is that we have some relocatable device code compiled into a static library (e.g. some math/toolbox library), and we want to use some functions of that precompiled library into another device library of our program: libutil.a ---> libtest.so ---> test_pgm Let's say that this external library contains the following function: __device__ int my_square (int a);

CUDA 5.0: CUBIN and CUBLAS_device, compute capability 3.5

社会主义新天地 提交于 2019-12-06 08:13:53
问题 I'm trying to compile a kernel that uses dynamic parallelism to run CUBLAS to a cubin file. When I try to compile the code using the command nvcc -cubin -m64 -lcudadevrt -lcublas_device -gencode arch=compute_35,code=sm_35 -o test.cubin -c test.cu I get ptxas fatal : Unresolved extern function 'cublasCreate_v2 If I add the -rdc=true compile option it compiles fine, but when I try to load the module using cuModuleLoad I get error 500: CUDA_ERROR_NOT_FOUND. From cuda.h: /** * This indicates that

How can I compile a CUDA program for sm_1X AND sm_2X when I have a surface declaration

↘锁芯ラ 提交于 2019-12-06 07:14:02
问题 I am writing a library that uses a surface (to re-sample and write to a texture) for a performance gain: ... surface<void, 2> my_surf2D; //allows writing to a texture ... The target platform GPU has compute capability 2.0 and I can compile my code with: nvcc -arch=sm_20 ... and it works just fine. The problem is when I am trying to develop and debug the library on my laptop which has an NVIDIA ION GPU with compute capability 1.1 (I would also like my library to be backwards compatible). I

Caffe compilation fails due to unsupported gcc compiler version

落爺英雄遲暮 提交于 2019-12-06 04:54:36
问题 I struggle with Caffe compilation. Unfortunately I failed to compile it. Steps I followed: git clone https://github.com/BVLC/caffe.git cd caffe mkdir build cd build cmake .. make all Running make all fails with the following error message: [ 2%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile.dir/util/cuda_compile_generated_im2col.cu.o In file included from /usr/include/cuda_runtime.h:59:0, from <command-line>:0: /usr/include/host_config.h:82:2: error: #error -- unsupported

check if nvcc is available in makefile

我怕爱的太早我们不能终老 提交于 2019-12-06 04:30:42
I have two versions of a function in an application, one implemented in CUDA and the other in standard C. They're in separate files, let's say cudafunc.h and func.h (the implementations are in cudafunc.cu and func.c ). I'd like to offer two options when compiling the application. If the person has nvcc installed, it'll compile the cudafunc.h . Otherwise, it'll compile func.h . Is there anyway to check if a machine has nvcc installed in the makefile and thus adjust the compiler accordingly? Thanks a bunch, This should work, included in your Makefile: NVCC_RESULT := $(shell which nvcc 2> NULL)

Why do gcc and NVCC (g++) see two different structure sizes?

老子叫甜甜 提交于 2019-12-06 00:22:29
问题 I am trying to add CUDA to an existing single threaded C program that was written sometime in the late 90s. To do this I need to mix two languages, C and C++ (nvcc is a c++ compiler). The problem is that the C++ compiler sees a structure as a certain size, while the C compile sees the same structure as a slightly different size. Thats bad. I am really puzzled by this because I can't find a cause for a 4 byte discrepancy. /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld:

How to specify alignment for global device variables in CUDA

六眼飞鱼酱① 提交于 2019-12-05 22:50:21
I would like to declare the alignment for a global device variable in CUDA. Specifically, I have a string declaration, like __device__ char str1 = "some pre-defined string"; In normal gcc, I can request alignment from the compiler as __device__ char str1 __attribute__ ((aligned (4))) = "some pre-defined string"; However, when I tried this on nvcc, the compiler ignores these requests. The reason I would like to do this is to copy these strings onto a buffer in my kernels, and copying words at a time is much faster than copying bytes at a time, though they require that the src string be aligned.

How to hide NVCC's “function was declared but never referenced” warnings?

久未见 提交于 2019-12-05 21:04:33
When compiling CUDA programs which use Google Test, nvcc will emit false-positive warnings: function <name> was declared but never referenced An MCVE: // test.cu #include <gtest/gtest.h> namespace { __global__ void a_kernel() { printf("Works"); } TEST(ExampleTest, ExampleTestCase) { a_kernel<<<1, 1>>>(); } } Compiling it gives: $ nvcc test.cu -lgtest -lgtest_main test.cu(9): warning: function "<unnamed>::ExampleTest_ExampleTestCase_Test::ExampleTest_ExampleTestCase_Test()" was declared but never referenced This is confirmed with the master branch of google test and CUDA 9.1 (I believe it

how to compile Cuda source with Go language's cgo?

不打扰是莪最后的温柔 提交于 2019-12-05 19:09:29
I wrote a simple program in cuda-c and it works on eclipse nsight. This is source code: #include <iostream> #include <stdio.h> __global__ void add( int a,int b, int *c){ *c = a + b; } int main(void){ int c; int *dev_c; cudaMalloc((void**)&dev_c, sizeof(int)); add <<<1,1>>>(2,7,dev_c); cudaMemcpy(&c, dev_c, sizeof(int),cudaMemcpyDeviceToHost); printf("\n2+7= %d\n",c); cudaFree(dev_c); return 0; } Now I'm trying to use this code with Go language with cgo!!! So I wrote this new code: package main //#include "/usr/local/cuda-7.0/include/cuda.h" //#include "/usr/local/cuda-7.0/include/cuda_runtime