cuda | 易学教程

How many CUDA cores is used to process a CUDA warp?

阅读更多关于 How many CUDA cores is used to process a CUDA warp?

问题 I'm reading for the answers and there are conflict ideas: In this link https://www.3dgep.com/cuda-thread-execution-model/, two warps (64 threads) can run concurrently on an SM (32 CUDA cores). So, I understand that the threads on a warp are splited and be processed on 16 CUDA cores. This idea makes sense for me because each CUDA core has 1 32bitALU. However, in other links, they claimed that 1 CUDA core is able to handle 32 concurrent threads (same as a warp size) (https://cvw.cac.cornell.edu

atomicCAS for bool implementation

阅读更多关于 atomicCAS for bool implementation

问题 I'm trying to figure out is there a bug in the answer (now deleted) about the implementation of Cuda-like atomicCAS for bool s. The code from the answer (reformatted): static __inline__ __device__ bool atomicCAS(bool *address, bool compare, bool val) { unsigned long long addr = (unsigned long long)address; unsigned pos = addr & 3; // byte position within the int int *int_addr = (int *)(addr - pos); // int-aligned address int old = *int_addr, assumed, ival; do { assumed = old; if(val) ival =

Sum a variable over all threads in a CUDA Kernel and return it to Host

阅读更多关于 Sum a variable over all threads in a CUDA Kernel and return it to Host

问题 I new in cuda and I'm try to implement a Kernel to calculate the energy of my Metropolis Monte Carlo Simulation. I'll put here the linear version of this function: float calc_energy(struct frame frm, float L, float rc){ int i,j; float E=0, rij, dx, dy, dz; for(i=0; i<frm.natm; i++) { for(j=i+1; j<frm.natm; j++) { dx = fabs(frm.conf[j][0] - frm.conf[i][0]); dy = fabs(frm.conf[j][1] - frm.conf[i][1]); dz = fabs(frm.conf[j][2] - frm.conf[i][2]); dx = dx - round(dx/L)*L; dy = dy - round(dy/L)*L;

Tensorflow can't find libcuda.so (CUDA 7.5)

阅读更多关于 Tensorflow can't find libcuda.so (CUDA 7.5)

问题 I've installed CUDA 7.5 toolkit, and Tensorflow inside anaconda env. The CUDA driver is also installed. The folder containing the so libraries is in LD_LIBRARY_PATH . When I import tensorflow I get the following error: Couldn't open CUDA library libcuda.so. LD_LIBRARY_PATH: /usr/local/cuda-7.5/lib64 In this folder, there exist a file named libcudart.so (which is actually a symbolic link to libcudart.so.7.5 ). So (just as a guess) I created a symbolic link to libcudart.so named libcuda.so .

question about modifing flag array in cuda

阅读更多关于 question about modifing flag array in cuda

问题 i am doing a research about GPU programming and have a question about modifying global array in thread. __device__ float data[10] = {0,0,0,0,0,0,0,0,0,1}; __global__ void gradually_set_global_data() { while (1) { if (data[threadIdx.x + 1]) { atomicAdd(&data[threadIdx.x], data[threadIdx.x + 1]); break; } } } int main() { gradually_set_global_data<<<1, 9>>>(); cudaDeviceReset(); return 0; } The kernel should complete execution with data expected to hold [1,1,1,1,1,1,1,1,1,1], but it gets stuck

question about modifing flag array in cuda

阅读更多关于 question about modifing flag array in cuda

question about modifing flag array in cuda

阅读更多关于 question about modifing flag array in cuda

Can multiple processes share one CUDA context?

阅读更多关于 Can multiple processes share one CUDA context?

问题 This question is a followup on Jason R's comment to Robert Crovellas answer on this original question ("Multiple CUDA contexts for one device - any sense?"): When you say that multiple contexts cannot run concurrently, is this limited to kernel launches only, or does it refer to memory transfers as well? I have been considering a multiprocess design all on the same GPU that uses the IPC API to transfer buffers from process to process. Does this mean that effectively, only one process at a

Installing cuda via brew and dmg

阅读更多关于 Installing cuda via brew and dmg

问题 After attempting to install nvidia toolkit on MAC by following guide : http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html#axzz4FPTBCf7X I received error "Package manifest parsing error" which led me to this : NVidia CUDA toolkit 7.5.27 failing to install on OS X . I unmounted the dmg and upshot was that instead of receiving "Package manifest parsing error" the installer would not launch (it seemed to launch briefly , then quit). Installing via command brew install

detectron2安装出现Kernel not compiled with GPU support 报错信息

阅读更多关于 detectron2安装出现Kernel not compiled with GPU support 报错信息

在安装使用 detectron2 的时候碰到**Kernel not compiled with GPU support **问题，前后拖了好久都没解决，现总结一下以备以后查阅。不想看心路历程的可以直接跳到最后一小节，哈哈哈。 environment 因为我使用的是实验室的服务器，所以很多东西没法改，我的 cuda 环境如下： ubuntu nvcc 默认版本是 9.2 nvidia-smi 版本又是 10.0 的我之前一直没搞清楚这 nvcc 和 nvidia-smi 版本为什么可以不一样，想了解原因的可以看一下我之前的文章显卡，显卡驱动,nvcc, cuda driver,cudatoolkit,cudnn 到底是什么？。 reproduce 我一般都用 Anaconda 来安装 pytorch，第一次安装的时候使用的如下命令安装的： conda create -n myenv python=3.7 conda activate myenv conda install pytorch torchvision cudatoolkit=10.1 -c pytorch 按理说这个命令会给 myenv 环境安装 cuda 编译器和驱动等，但是在运行代码的时候还是会出现标题中的报错信息。我猜可能是因为 detectron2 在 build 的时候使用的是 /usr/local