cuda

How many CUDA cores is used to process a CUDA warp?

霸气de小男生 提交于 2020-06-17 15:49:47
问题 I'm reading for the answers and there are conflict ideas: In this link https://www.3dgep.com/cuda-thread-execution-model/, two warps (64 threads) can run concurrently on an SM (32 CUDA cores). So, I understand that the threads on a warp are splited and be processed on 16 CUDA cores. This idea makes sense for me because each CUDA core has 1 32bitALU. However, in other links, they claimed that 1 CUDA core is able to handle 32 concurrent threads (same as a warp size) (https://cvw.cac.cornell.edu

atomicCAS for bool implementation

霸气de小男生 提交于 2020-06-09 05:49:12
问题 I'm trying to figure out is there a bug in the answer (now deleted) about the implementation of Cuda-like atomicCAS for bool s. The code from the answer (reformatted): static __inline__ __device__ bool atomicCAS(bool *address, bool compare, bool val) { unsigned long long addr = (unsigned long long)address; unsigned pos = addr & 3; // byte position within the int int *int_addr = (int *)(addr - pos); // int-aligned address int old = *int_addr, assumed, ival; do { assumed = old; if(val) ival =

Sum a variable over all threads in a CUDA Kernel and return it to Host

末鹿安然 提交于 2020-06-09 05:18:04
问题 I new in cuda and I'm try to implement a Kernel to calculate the energy of my Metropolis Monte Carlo Simulation. I'll put here the linear version of this function: float calc_energy(struct frame frm, float L, float rc){ int i,j; float E=0, rij, dx, dy, dz; for(i=0; i<frm.natm; i++) { for(j=i+1; j<frm.natm; j++) { dx = fabs(frm.conf[j][0] - frm.conf[i][0]); dy = fabs(frm.conf[j][1] - frm.conf[i][1]); dz = fabs(frm.conf[j][2] - frm.conf[i][2]); dx = dx - round(dx/L)*L; dy = dy - round(dy/L)*L;

Tensorflow can't find libcuda.so (CUDA 7.5)

让人想犯罪 __ 提交于 2020-05-28 05:05:47
问题 I've installed CUDA 7.5 toolkit, and Tensorflow inside anaconda env. The CUDA driver is also installed. The folder containing the so libraries is in LD_LIBRARY_PATH . When I import tensorflow I get the following error: Couldn't open CUDA library libcuda.so. LD_LIBRARY_PATH: /usr/local/cuda-7.5/lib64 In this folder, there exist a file named libcudart.so (which is actually a symbolic link to libcudart.so.7.5 ). So (just as a guess) I created a symbolic link to libcudart.so named libcuda.so .

question about modifing flag array in cuda

こ雲淡風輕ζ 提交于 2020-05-27 06:06:31
问题 i am doing a research about GPU programming and have a question about modifying global array in thread. __device__ float data[10] = {0,0,0,0,0,0,0,0,0,1}; __global__ void gradually_set_global_data() { while (1) { if (data[threadIdx.x + 1]) { atomicAdd(&data[threadIdx.x], data[threadIdx.x + 1]); break; } } } int main() { gradually_set_global_data<<<1, 9>>>(); cudaDeviceReset(); return 0; } The kernel should complete execution with data expected to hold [1,1,1,1,1,1,1,1,1,1], but it gets stuck

question about modifing flag array in cuda

空扰寡人 提交于 2020-05-27 06:05:28
问题 i am doing a research about GPU programming and have a question about modifying global array in thread. __device__ float data[10] = {0,0,0,0,0,0,0,0,0,1}; __global__ void gradually_set_global_data() { while (1) { if (data[threadIdx.x + 1]) { atomicAdd(&data[threadIdx.x], data[threadIdx.x + 1]); break; } } } int main() { gradually_set_global_data<<<1, 9>>>(); cudaDeviceReset(); return 0; } The kernel should complete execution with data expected to hold [1,1,1,1,1,1,1,1,1,1], but it gets stuck

question about modifing flag array in cuda

隐身守侯 提交于 2020-05-27 06:05:15
问题 i am doing a research about GPU programming and have a question about modifying global array in thread. __device__ float data[10] = {0,0,0,0,0,0,0,0,0,1}; __global__ void gradually_set_global_data() { while (1) { if (data[threadIdx.x + 1]) { atomicAdd(&data[threadIdx.x], data[threadIdx.x + 1]); break; } } } int main() { gradually_set_global_data<<<1, 9>>>(); cudaDeviceReset(); return 0; } The kernel should complete execution with data expected to hold [1,1,1,1,1,1,1,1,1,1], but it gets stuck

Can multiple processes share one CUDA context?

无人久伴 提交于 2020-05-15 09:26:21
问题 This question is a followup on Jason R's comment to Robert Crovellas answer on this original question ("Multiple CUDA contexts for one device - any sense?"): When you say that multiple contexts cannot run concurrently, is this limited to kernel launches only, or does it refer to memory transfers as well? I have been considering a multiprocess design all on the same GPU that uses the IPC API to transfer buffers from process to process. Does this mean that effectively, only one process at a

Installing cuda via brew and dmg

∥☆過路亽.° 提交于 2020-05-11 05:24:06
问题 After attempting to install nvidia toolkit on MAC by following guide : http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html#axzz4FPTBCf7X I received error "Package manifest parsing error" which led me to this : NVidia CUDA toolkit 7.5.27 failing to install on OS X . I unmounted the dmg and upshot was that instead of receiving "Package manifest parsing error" the installer would not launch (it seemed to launch briefly , then quit). Installing via command brew install

detectron2安装出现Kernel not compiled with GPU support 报错信息

纵然是瞬间 提交于 2020-05-09 06:40:12
在安装使用 detectron2 的时候碰到**Kernel not compiled with GPU support **问题,前后拖了好久都没解决,现总结一下以备以后查阅。 不想看心路历程的可以直接跳到最后一小节,哈哈哈。 environment 因为我使用的是实验室的服务器,所以很多东西没法改,我的 cuda 环境如下: ubuntu nvcc 默认版本是 9.2 nvidia-smi 版本又是 10.0 的 我之前一直没搞清楚这 nvcc 和 nvidia-smi 版本为什么可以不一样,想了解原因的可以看一下我之前的文章 显卡,显卡驱动,nvcc, cuda driver,cudatoolkit,cudnn 到底是什么? 。 reproduce 我一般都用 Anaconda 来安装 pytorch,第一次安装的时候使用的如下命令安装的: conda create -n myenv python=3.7 conda activate myenv conda install pytorch torchvision cudatoolkit=10.1 -c pytorch 按理说这个命令会给 myenv 环境安装 cuda 编译器和驱动等,但是在运行代码的时候还是会出现标题中的报错信息。我猜可能是因为 detectron2 在 build 的时候使用的是 /usr/local