cuda | 易学教程

从零开始学Pytorch（零）之安装Pytorch

阅读更多关于从零开始学Pytorch（零）之安装Pytorch

点击上方“ 计算机视觉cv ”即可“进入公众号” 重磅干货第一时间送达 Pytorch优势聊聊为什么使用Pytorch，个人觉得Pytorch比Tensorflow对新手更为友善，而且现在Pytorch在学术界使用的得更多，大有逆袭Tensorflow之势。最近两年的顶会文章中，代码用Pytorch的比Tensorflow多。大家如果对Tensorflow也感兴趣，完全可以学习了Pytorch之后继续学习Tensorflow，基本原理都是相通的。让我们开始开启愉快的Pytorch学习之旅吧！在Ubuntu系统或是windows系统安装Pytorch 首先打开Pytorch的官网：https://pytorch.org/。在首页我们可以看到有各种配置可选，我们这里选择CPU或是GPU的版本都可以。一般选择GPU版本的Pytorch，这样运行大型的程序速度会快很多。而要运行GPU的Pytorch，就需要在电脑（不管是Ubuntu系统还是windows系统）安装相应的CUDA9和cudnn7，这两个安装起来有时候会遇到一大堆问题，所以需要一步一步解决。推荐一个我写的在Ubuntu系统安装教程的博客，按照博客上写的一步步来就可以啦。博客链接：https://blog.csdn.net/xiewenrui1996/article/details/102736238 。

Access GPU hardware specifications in Python?

阅读更多关于 Access GPU hardware specifications in Python?

问题 I want to access various NVidia GPU specifications using Numba or a similar Python CUDA pacakge. Information such as available device memory, L2 cache size, memory clock frequency, etc. From reading this question, I learned I can access some of the information (but not all) through Numba's CUDA device interface. from numba import cuda device = cuda.get_current_device() attribs = [s for s in dir(device) if s.isupper()] for attr in attribs: print(attr, '=', getattr(device, attr)) Output on a

“unknown error” while using dynamic allocation inside device function in CUDA

阅读更多关于 “unknown error” while using dynamic allocation inside __device__ function in CUDA

问题 I'm trying to implement a linked list in a CUDA application to model a growing network. In oder to do so I'm using malloc inside the __device__ function, aiming to allocate memory in the global memory. The code is: void __device__ insereviz(Vizinhos **lista, Nodo *novizinho, int *Gteste) { Vizinhos *vizinho; vizinho=(Vizinhos *)malloc(sizeof(Vizinhos)); vizinho->viz=novizinho; vizinho->proxviz=*lista; *lista=vizinho; novizinho->k=novizinho->k+1; } After a certain number of allocated elements

Cuda coalesced memory load behavior

阅读更多关于 Cuda coalesced memory load behavior

问题 I am working with an array of structure, and I want for each block to load in shared memory one cell of the array. For example : block 0 will load array[0] in shared memory and block 1 will load array[1]. In order to do that I cast the array of structure in float* in order to try to coalesce memory access. I have two version of the code Version 1 __global__ void load_structure(float * label){ __shared__ float shared_label[48*16]; __shared__ struct LABEL_2D* self_label; shared_label[threadIdx

Cuda coalesced memory load behavior

阅读更多关于 Cuda coalesced memory load behavior

switch cuda compute mode to default mode

阅读更多关于 switch cuda compute mode to default mode

问题 I use nvidia-smi to see the status of each GPU on a computing node but find one of them is E. Thread . Is there any easy way to switch it back to default mode? ------------------------------------------------------+ | NVIDIA-SMI 346.46 Driver Version: 346.46 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======================

BLAS equivalent of a LAPACK function for GPUs

阅读更多关于 BLAS equivalent of a LAPACK function for GPUs

问题 In LAPACK there is this function for diagonalization SUBROUTINE DSPGVX( ITYPE, JOBZ, RANGE, UPLO, N, AP, BP, VL, VU, $ IL, IU, ABSTOL, M, W, Z, LDZ, WORK, IWORK, $ IFAIL, INFO ) * I am looking for its GPU implementation. I am trying to find whether this function has been already implemented in CUDA (or OpenCL), but have only found CULA, which is not open source. Therefore and side CUBLAS exists, I wonder how could I know whether a BLAS or CUBLAS equivalent of this subroutine is available. 回答1

Cuda atomics change flag

阅读更多关于 Cuda atomics change flag

问题 I have a piece of serial code which does something like this if( ! variable ) { do some initialization here variable = true; } I understand that this works perfectly fine in serial and will only be executed once. What atomics operation would be the correct one here in CUDA? 回答1: It looks to me like what you want is a "critical section" in your code. A critical section allows one thread to execute a sequence of instructions while preventing any other thread or threadblock from executing those

Cuda atomics change flag

阅读更多关于 Cuda atomics change flag

Cuda atomics change flag

阅读更多关于 Cuda atomics change flag