nvcc

Building GPL C program with CUDA module

阅读更多关于 Building GPL C program with CUDA module

问题 I am attempting to modify a GPL program written in C. My goal is to replace one method with a CUDA implementation, which means I need to compile with nvcc instead of gcc. I need help building the project - not implementing it (You don't need to know anything about CUDA C to help, I don't think). This is my first time trying to change a C project of moderate complexity that involves a .configure and Makefile. Honestly, this is my first time doing anything in C in a long time, including

What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

阅读更多关于 What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

I've recently gotten my head around how NVCC compiles CUDA device code for different compute architectures. From my understanding, when using NVCC's -gencode option, "arch" is the minimum compute architecture required by the programmer's application, and also the minimum device compute architecture that NVCC's JIT compiler will compile PTX code for. I also understand that the "code" parameter of -gencode is the compute architecture which NVCC completely compiles the application for, such that no JIT compilation is necessary. After inspection of various CUDA project Makefiles, I've noticed the

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

阅读更多关于 Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

问题 I was testing the new CUDA 8 along with the Pascal Titan X GPU and is expecting speed up for my code but for some reason it ends up being slower. I am on Ubuntu 16.04. Here is the minimum code that can reproduce the result: CUDASample.cuh class CUDASample{ public: void AddOneToVector(std::vector<int> &in); }; CUDASample.cu __global__ static void CUDAKernelAddOneToVector(int *data) { const int x = blockIdx.x * blockDim.x + threadIdx.x; const int y = blockIdx.y * blockDim.y + threadIdx.y; const

CUDA and nvcc: using the preprocessor to choose between float or double

阅读更多关于 CUDA and nvcc: using the preprocessor to choose between float or double

问题 The problem : Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for cuda with computing capability < 1.3 then define real to be float. After many hours I came to this (which does not work ) # if defined(__CUDACC__) # warning * making definitions for cuda # if defined(__CUDA_ARCH__) # warning __CUDA_ARCH__ is defined # else # warning __CUDA_ARCH__ is NOT defined # endif # if (__CUDA_ARCH__ >= 130) # define real

Why does nvcc fails to compile a CUDA file with boost::spirit?

阅读更多关于 Why does nvcc fails to compile a CUDA file with boost::spirit?

问题 I\'m trying to integrate CUDA to an existing aplication wich uses boost::spirit. Isolating the problem, I\'ve found out that the following code does not copile with nvcc: main.cu : #include <boost/spirit/include/qi.hpp> int main(){ exit(0); } Compiling with nvcc -o cudaTest main.cu I get a lot of errors that can be seen here. But if I change the filename to main.cpp , and compile again using nvcc , it works. What is happening here and how can I fix it? 回答1: nvcc sometimes has trouble

Linking error: DSO missing from command line

阅读更多关于 Linking error: DSO missing from command line

问题 I am rather new to Linux (using Ubuntu 14.04 LTS 64bit), coming from Windows, and am attempting to port over an existing CUDA project of mine. When linking via /usr/local/cuda/bin/nvcc -arch=compute_30 -code=sm_30,compute_30 -o Main.o Display.o FileUtil.o Timer.o NeuralNetwork.o -L/usr/lib -L/usr/local/lib -L/usr/lib/x86_64-linux-gnu -L/usr/local/cuda/lib64 -lGLEW -lglfw3 -lGL -lGLU -lcuda -lcudart I encounter the following error: /usr/bin/ld: /usr/local/lib/libglfw3.a(x11_clipboard.c.o):

What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

阅读更多关于 What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

问题 I\'ve recently gotten my head around how NVCC compiles CUDA device code for different compute architectures. From my understanding, when using NVCC\'s -gencode option, \"arch\" is the minimum compute architecture required by the programmer\'s application, and also the minimum device compute architecture that NVCC\'s JIT compiler will compile PTX code for. I also understand that the \"code\" parameter of -gencode is the compute architecture which NVCC completely compiles the application for,

Building GPL C program with CUDA module

What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

CUDA and nvcc: using the preprocessor to choose between float or double

Why does nvcc fails to compile a CUDA file with boost::spirit?

Linking error: DSO missing from command line

What is the purpose of using multiple “arch” flags in Nvidia&#39;s NVCC compiler?

What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?