nvcc | 易学教程

nvcc/cuda 3.1 - ghtr-default.h flood of “declared static” but not defined warnings

阅读更多关于 nvcc/cuda 3.1 - ghtr-default.h flood of “declared static” but not defined warnings

问题 When compiling a project with nvcc (using cuda 3.1), I'm getting a flood of warnings from gthr-default.h: /usr/lib/gcc/x86_64-redhat-linux/4.4.4/../../../../include/c++/4.4.4/x86_64-redhat-linux/bits/gthr-default.h:118: warning: ‘int __gthrw_pthread_once(pthread_once_t*, void (*)())’ declared ‘static’ but never defined /usr/lib/gcc/x86_64-redhat-linux/4.4.4/../../../../include/c++/4.4.4/x86_64-redhat-linux/bits/gthr-default.h:119: warning: ‘void* __gthrw_pthread_getspecific(pthread_key_t)

Compile linux gcc in windows - nvcc in windows

阅读更多关于 Compile linux gcc in windows - nvcc in windows

问题 here is an interesting question that, if answered positively, would make cross compiling a whole lot easier. Since gcc is written in C++, would it be possible to recompile the Linux gcc compiler on Windows MinGW G++ or VSC++ compiler, so that the resulting Windows executable would be able to compile c code to linux programs? If so, what would be needed to do that? So to simplify, here is what I want to do. mingw32-g++ gcc.cpp -o gcc.exe The command will probably not work because it would

Registers and shared memory depending on compiling compute capability?

阅读更多关于 Registers and shared memory depending on compiling compute capability?

问题 Hey there, when I compile with nvcc -arch=sm_13 I get: ptxas info : Used 29 registers, 28+16 bytes smem, 7200 bytes cmem[0], 8 bytes cmem[1] when I use nvcc -arch=sm_20 I get: ptxas info : Used 34 registers, 60 bytes cmem[0], 7200 bytes cmem[2], 4 bytes cmem[16] I thought all the kernel parameters are passed to shared memory but for sm_20, it doesn't seem so...?! Perhaps they are also passed into registers? The head of my function looks like the following: __global__ void func(double *,

Nvcc has different version than CUDA

阅读更多关于 Nvcc has different version than CUDA

问题 I got installed cuda 7, but when I hit nvcc --version, it prints out 6.5. I would like to install Theano library on GTX 960 card, but it needs nvcc 7.0. Ive tried reinstall cuda, but it didn't update nvcc. When I run apt-get install nvidida-cuda-toolkit, it instals only 6.5. How can I update nvcc to version 7.0 ? 回答1: Please follow the official installation guide to uninstall your current cuda environment and then install the lateest CUDA developing environment, it includes cudaSDK,

cudart_static - when is it necessary?

阅读更多关于 cudart_static - when is it necessary?

问题 Since newer drivers ship with the CUDA runtime (I can choose 9.1 or 9.2 in the drivers download page) my question is: should my library (which uses a CUDA kernel internally) be shipped with -lcudart_static ? I had issues launching kernels compiled with 9.2 on systems which used 9.1 CUDA drivers. What's the most 'compatible' way of ensuring my library will run everywhere a recent CUDA driver is installed? (I'm already compiling for a virtual architecture) 回答1: Since newer drivers ship with the

When to use volatile with register/local variables

阅读更多关于 When to use volatile with register/local variables

问题 What is the meaning of declaring register arrays in CUDA with volatile qualifier? When I tried with volatile keyword with a register array, it removed the number of spilled register memory to local memory. (i.e. Force the CUDA to use registers instead of local memory) Is this the intended behavior? I did not find any information about the usage of volatile with regard to register arrays in CUDA documentation. Here is the ptxas -v output for both versions With volatile qualifier __volatile__

Link kernels together

阅读更多关于 Link kernels together

问题 I have a CUDA kernel in a .cu file and another CUDA kernel in another .cu file. I know that with dynamic parallelism I can call another CUDA kernel from a parent kernel but I'd like to know if there's any way to do this with a child kernel residing in another .cu file. 回答1: Yes, you can. The key is to use separate compilation with device code linking, which is available with nvcc. Since this is already required for usage of dynamic parallelism, there's really nothing new here. Here's a simple

CUDA constant memory value not correct [duplicate]

阅读更多关于 CUDA constant memory value not correct [duplicate]

问题 This question already has an answer here : CUDA writing to constant memory wrong value (1 answer) Closed 5 years ago . I have been reading through many of the SO questions related to constant memory and I still don't understand why my program is not working. Overall it looks like follows Common.cuh __constant__ int numElements; __global__ void kernelFunction(); Common.cu #include "Common.cuh" #include <stdio.h> __global__ kernelFunction() { printf("NumElements = %d", numElements); } Test.cu

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

阅读更多关于 Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

问题 I was testing the new CUDA 8 along with the Pascal Titan X GPU and is expecting speed up for my code but for some reason it ends up being slower. I am on Ubuntu 16.04. Here is the minimum code that can reproduce the result: CUDASample.cuh class CUDASample{ public: void AddOneToVector(std::vector<int> &in); }; CUDASample.cu __global__ static void CUDAKernelAddOneToVector(int *data) { const int x = blockIdx.x * blockDim.x + threadIdx.x; const int y = blockIdx.y * blockDim.y + threadIdx.y; const

How can one configure mex to pass compiler flags to nvcc

阅读更多关于 How can one configure mex to pass compiler flags to nvcc

问题 While compiling mex files with nvcc I have struggled to pass compiler options specific to CUDA to the nvcc compiler, as mex doesn't recognize them. I found some old posts about passing compiler flags and some newer ones, but the questions are quite user-specific, and the mex compiler has changed over the years, so I cant figure out what to do. So, my specific question: What should I do to make mex pass compiler flags to nvcc ? A bit more generic: What should one do to make mex pass compiler