cuda | 易学教程

Add nvidia runtime to docker runtimes

阅读更多关于 Add nvidia runtime to docker runtimes

问题 I’m running a virtual vachine on GCP with a tesla GPU. And try to deploy a PyTorch -based app to accelerate it with GPU. I want to make docker use this GPU, have access to it from containers. I managed to install all drivers on host machine, and the app runs fine there, but when I try to run it in docker (based on nvidia/cuda container) pytorch fails: File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 82, in _check_driver http://www.nvidia.com/Download/index.aspx""")

「ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory」

阅读更多关于「ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory」

tensorflow版本和cuda，cudnn版本的兼容性问题。参考以下博客：「 https://blog.csdn.net/qq_18649781/article/details/89045243 」「 https://blog.csdn.net/omodao1/article/details/83241074 」「tensorflow版本和cuda，cudnn版本的兼容表格」【给Ubuntu装上xrdp的远程桌面之后，远程登陆以后发现无法调用之前给主机安装的tensorflow，出现了一样的报错】【排查了，不是版本问题，而是ldconfig的库地址环境变量问题？】 sudo ldconfig /usr/local/cuda- 9.0 /lib64 【参考】【 https://blog.csdn.net/eb_num/article/details/89602652 】【使用ldconfig命令】【 https://blog.csdn.net/winycg/article/details/80572735 】【关于ldconfig命令的作用】来源： oschina 链接： https://my.oschina.net/u/4397772/blog/3517735

windows7 安装pytorch

阅读更多关于 windows7 安装pytorch

这几天为了运行python的图像转换的项目，不得不安装pytorch，安装了两天，最后把经验记录一下。如果版本不匹配会抛出很多错误，而网上的各种解决方式有大部分也解决不了问题。在安装pytorch之前，首先是你的PC安装的是NVIDIA显卡，然后打开NVIDIA控制面板，查看CUDA的版本，老的显卡建议就到8.0，因为9.0的cuda要求显卡的计算能力在3.5以上我的显卡计算能力只有3.0。如果显卡的版本较低需要进行cuda版本升级。运算能力参照：https://blog.csdn.net/real_myth/article/details/44308169 NVIDIA版本下载：https://developer.nvidia.com/cuda-toolkit-archive 显卡版本安装之后，需要安装Anaconda3，参照网上的例子如果这时候进行pytorch安装，会默认安装cuda9.1，即使你指定cuda80也不好用，所以只能下载离线版本的 cuda为80版本的pytorch。 https://pan.baidu.com/s/1dF6ayLr?errno=0&errmsg=Auth%20Login%20Sucess&&bduss=&ssnerror=0&traceid=#list/path=%2Fpytorch 下载到本地之后，在Anaconda

PyTorch 于 JupyterLab 的环境准备

阅读更多关于 PyTorch 于 JupyterLab 的环境准备

PyTorch 是目前主流的深度学习框架之一，而 JupyterLab 是基于 Web 的交互式笔记本环境。于 JupyterLab 我们可以边记笔记的同时、边执行 PyTorch 代码，便于自己学习、调试或以后回顾。本文将介绍这样的环境如何进行准备。了解更多： PyTorch 官方文档 JupyterLab 交互式笔记本安装 Anaconda Anaconda: https://www.anaconda.com/products/individual#Downloads 北外镜像源: https://mirrors.bfsu.edu.cn/help/anaconda/ # 激活 base 环境 conda activate base 安装 JupyterLab JupyterLab: https://jupyterlab.readthedocs.io/ 应该已随 Anaconda 安装，如下查看版本： jupyter --version 不然，如下进行安装： conda install -c conda-forge jupyterlab 执行 jupyter lab 启动，浏览器会打开 http://localhost:8888/ ：版本 < 3.0 建议安装 TOC 扩展： jupyter labextension install @jupyterlab/toc TOC

Is it possible to call cuBLAS or cuBLASLt functions from CUDA 10.1 kernels?

阅读更多关于 Is it possible to call cuBLAS or cuBLASLt functions from CUDA 10.1 kernels?

问题 Concerning CUDA 10.1 I'm doing some calculations on geometric meshes with a large amount of independent calculations done per face of the mesh. I run a CUDA kernel which does the calculation for each face. The calculations involve some matrix multiplication, so I'd like to use cuBLAS or cuBLASLt to speed things up. Since I need to do many matrix multiplications (at least a couple per face) I'd like to do it directly in the kernel. Is this possible? It doesn't seem like cuBLAS or cuBLASLt

Is it possible to call cuBLAS or cuBLASLt functions from CUDA 10.1 kernels?

阅读更多关于 Is it possible to call cuBLAS or cuBLASLt functions from CUDA 10.1 kernels?

CUDA: Triple nested loop with reduction inside

阅读更多关于 CUDA: Triple nested loop with reduction inside

问题 I need to convert the following code from C++ with OpenMP to C++ with CUDA. As answered in this question: CUDA access matrix stored in RAM and possibility of being implemented . It is possible to write the portion with OpenMP in CUDA. The first problem is that I don’t know what to do with que sums inside the kernel function. Legacy Code: /* definition of variables */ for (int l = 0; l < N_mesh_points_x; l++){ for (int m = 0; m < N_mesh_points_y; m++){ for (int p = 0; p < N_mesh_points_z; p++)

CUDA: Triple nested loop with reduction inside

阅读更多关于 CUDA: Triple nested loop with reduction inside

Launch Configuration in Thrust

阅读更多关于 Launch Configuration in Thrust

问题 I am trying to run some experiments on an algorithm coded in Thrust. I'd like to know the impact of the number of threads per block in the performance of my algorithm. Is it possible to restrict thrust so that it does not use more than X number of threads per block? 回答1: Thrust doesn't expose any ability to either directly set the number of threads per block or the number of blocks used in a particular kernel call. These things are indirectly determined by algorithm and problem size, but you

Launch Configuration in Thrust

阅读更多关于 Launch Configuration in Thrust