nvcc

Building GPL C program with CUDA module

南楼画角 提交于 2019-11-27 01:10:37
问题 I am attempting to modify a GPL program written in C. My goal is to replace one method with a CUDA implementation, which means I need to compile with nvcc instead of gcc. I need help building the project - not implementing it (You don't need to know anything about CUDA C to help, I don't think). This is my first time trying to change a C project of moderate complexity that involves a .configure and Makefile. Honestly, this is my first time doing anything in C in a long time, including

What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

强颜欢笑 提交于 2019-11-26 21:44:19
I've recently gotten my head around how NVCC compiles CUDA device code for different compute architectures. From my understanding, when using NVCC's -gencode option, "arch" is the minimum compute architecture required by the programmer's application, and also the minimum device compute architecture that NVCC's JIT compiler will compile PTX code for. I also understand that the "code" parameter of -gencode is the compute architecture which NVCC completely compiles the application for, such that no JIT compilation is necessary. After inspection of various CUDA project Makefiles, I've noticed the

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

本秂侑毒 提交于 2019-11-26 14:16:50
问题 I was testing the new CUDA 8 along with the Pascal Titan X GPU and is expecting speed up for my code but for some reason it ends up being slower. I am on Ubuntu 16.04. Here is the minimum code that can reproduce the result: CUDASample.cuh class CUDASample{ public: void AddOneToVector(std::vector<int> &in); }; CUDASample.cu __global__ static void CUDAKernelAddOneToVector(int *data) { const int x = blockIdx.x * blockDim.x + threadIdx.x; const int y = blockIdx.y * blockDim.y + threadIdx.y; const

CUDA and nvcc: using the preprocessor to choose between float or double

时光怂恿深爱的人放手 提交于 2019-11-26 12:43:51
问题 The problem : Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for cuda with computing capability < 1.3 then define real to be float. After many hours I came to this (which does not work ) # if defined(__CUDACC__) # warning * making definitions for cuda # if defined(__CUDA_ARCH__) # warning __CUDA_ARCH__ is defined # else # warning __CUDA_ARCH__ is NOT defined # endif # if (__CUDA_ARCH__ >= 130) # define real

Why does nvcc fails to compile a CUDA file with boost::spirit?

自作多情 提交于 2019-11-26 12:33:58
问题 I\'m trying to integrate CUDA to an existing aplication wich uses boost::spirit. Isolating the problem, I\'ve found out that the following code does not copile with nvcc: main.cu : #include <boost/spirit/include/qi.hpp> int main(){ exit(0); } Compiling with nvcc -o cudaTest main.cu I get a lot of errors that can be seen here. But if I change the filename to main.cpp , and compile again using nvcc , it works. What is happening here and how can I fix it? 回答1: nvcc sometimes has trouble

Linking error: DSO missing from command line

别来无恙 提交于 2019-11-26 11:03:46
问题 I am rather new to Linux (using Ubuntu 14.04 LTS 64bit), coming from Windows, and am attempting to port over an existing CUDA project of mine. When linking via /usr/local/cuda/bin/nvcc -arch=compute_30 -code=sm_30,compute_30 -o Main.o Display.o FileUtil.o Timer.o NeuralNetwork.o -L/usr/lib -L/usr/local/lib -L/usr/lib/x86_64-linux-gnu -L/usr/local/cuda/lib64 -lGLEW -lglfw3 -lGL -lGLU -lcuda -lcudart I encounter the following error: /usr/bin/ld: /usr/local/lib/libglfw3.a(x11_clipboard.c.o):

What is the purpose of using multiple “arch” flags in Nvidia&#39;s NVCC compiler?

佐手、 提交于 2019-11-26 08:01:41
问题 I\'ve recently gotten my head around how NVCC compiles CUDA device code for different compute architectures. From my understanding, when using NVCC\'s -gencode option, \"arch\" is the minimum compute architecture required by the programmer\'s application, and also the minimum device compute architecture that NVCC\'s JIT compiler will compile PTX code for. I also understand that the \"code\" parameter of -gencode is the compute architecture which NVCC completely compiles the application for,