cuda | 易学教程

YOLOv5来了

阅读更多关于 YOLOv5来了

软硬件环境 ubuntu 18.04 64bit anaconda with 3.7 nvidia gtx 1070Ti cuda 10.1 pytorch 1.5 YOLOv5 前言 YOLOv4 还没有退热， YOLOv5 就已经来了！ 6月9日， Ultralytics 公司开源了 YOLOv5 ，离上一次 YOLOv4 发布不到50天，不过这一次的 YOLOv5 是基于 PyTorch 实现的，而不是之前版本的 darknet ！根据官方给出的数字， YOLOv5 的速度最快可以达到每秒140帧（ FPS ），但是权重文件只有 YOLOv4 的1/9，而且准确度更高。本次的发布的 YOLOv5 并不是一个单独的模型，而是一个模型家族，包括了 YOLOv5s 、 YOLOv5m 、 YOLOv5l 、 YOLOv5x ，要求 Python 3.7和 PyTorch 1.5以上版本。关于 YOLOv5 这个版本，大家可以看看知乎中的讨论，链接放在文末的参考资料中安装GPU环境请参考之前的文章 ubuntu安装cuda windows 10安装cuda和cudnn 安装pytorch的GPU版本来到官网 https://pytorch.org/get-started/locally/，根据自己的环境，进行选择，网站会给出相应的安装命令。我这里的环境是 linux 、

深度学习环境搭建：window10+CUDA10.0+CUDNN+pytorch1.2.0

阅读更多关于深度学习环境搭建：window10+CUDA10.0+CUDNN+pytorch1.2.0

去年底入手一台联想Y7000P,配置了Nvidia GeForce GTX 1660 Ti GPU，GPU内存6G，但是因为有GPU服务器，所以一直没有在这台笔记本上跑过模型，如今经过一番折腾，终于在此笔记本上搭建好了环境，并成功使用GPU训练了一些模型，本篇记录了环境搭建的过程。检查你的GPU 首先确保你的电脑有Nvidia的GPU，并且支持CUDA，可以参考这个网址。安装vs2017 Visual Studio 2017 Community下载地址安装选项：勾选“C++的桌面开发”,右边的列表再额外勾选一个SDK,这个SDK是在后续测试CUDA样例的时候要用到的，如下图：安装CUDA10.0 下载打开网站： CUDA10.0 按照下图选择对应的选项后，点击下载：安装双击下载的文件，选择自定义安装，如果之前你已经安装过显卡驱动并且兼容CUDA10.0，可以在这里去掉显卡驱动的勾选，兼容情况参考这里，截图如下：另外，去掉Visual studio integration的勾选：后面默认选择下一步，等待安装完成。测试命令行测试： nvcc -V 输出以下信息即成功: 样例测试：以管理员方式打开vs2017，然后加载bandwidthTest解决方案，路径如下： C:\ProgramData\NVIDIA Corporation\CUDA Samples

Ubuntu 16.04 安装显卡驱动后循环登录和无法设置分辨率的一种解决方案

阅读更多关于 Ubuntu 16.04 安装显卡驱动后循环登录和无法设置分辨率的一种解决方案

1. 安装环境电脑：MSI GP63 显卡：GeForce GTX 1070 系统：Ubuntu 16.04 驱动版本：NVIDIA 384.130 2. 循环登录如果按照这篇文章 Ubuntu 16.04 安装 CUDA、CUDNN 和 GPU 版本的 TensorFlow 一般步骤总结中说的直接在设置中安装驱动的话，就会遇到在登录界面循环登录的问题。于是我们转而利用从官网下载的 run 文件来安装，而驱动的版本则选择和在设置中附加驱动里看到的一样。在 BIOS 里面关闭快速启动和安全启动进入 Ubuntu 系统，Ctrl+Alt+F1 进入 tty1 模式输入用户名和密码进行登录关闭图形界面 sudo service lightdm stop 给 run 文件赋予执行权限 sudo chmod +x NVIDIA*.run (代表下载的安装文件) sudo ./NVIDIA*.run -no-x-check -no-nouveau-check -no-opengl-files 中间有警告的话选继续安装，不认证打开图形界面 sudo service lightdm start 重启按照这个方法安装驱动后可以正常登录进系统，运行 nvidia-smi 命令也可以看到显卡信息，但在设置中依然只有一个 800*600 的分辨率选项。 3. 无法设置分辨率具体表现

连Python都不熟也能跑通AI人脸识别？“隐藏Boss”竟是它！

阅读更多关于连Python都不熟也能跑通AI人脸识别？“隐藏Boss”竟是它！

摘要：先把AI人脸识别跑起来，然后研究它是如何实现的，整个过程中确实收获不少。所谓先跟着做，再跟着学，实践与理论结合，自己感觉有理解了一些基础概念入个门，在此分享一下自己的捣鼓经验。 1、买台小“电脑” 既然要做人脸识别，那得找台带摄像头的小电脑啊。首先得价格便宜，简单搜了下，基本有以下几个选择：树莓派4： ARM系统，生态好。价格合适，55刀。CPU在3个中最好，算力0.1TFLOPS K210：RISC-V的（非ARM），价格是最实惠的，299元。算力有0.8TOPS Jetson Nano：ARM系统，比树莓派4还贵，但是多一个英伟达的GPU（当然是丐版的GPU），价格99刀。算力0.47TFLOPS 这3个里面，考虑到人脸识别应该有更多的AI属性，那么带GPU能做AI推理不是更香么，于是就选择了英伟达的Jetson Nano开发板（主要也是想先入门英伟达的GPU派系，谁叫现在NVIDIA比较香呢）。参考链接： https://www.zhihu.com/question/384561694 https://zhuanlan.zhihu.com/p/81969854 2、启动系统这里需要先把“系统image”刷到 tf 卡里面，然后把tf卡插到开发板上，然后开机启动。启动有2个点需要注意：跳线帽，需要插上（不然电源点不亮）。第一次开机会卡住，需要重启一次。启动后

undefined reference to cusolverDn

阅读更多关于 undefined reference to cusolverDn

问题 I am trying to run the cuSolver library available in cuda 7.0. I have an issue with using the cuSolver library that must be very simple to fix, but here I am asking for some help. I have looked at quite a few examples posted around and I chose in particular this one from JackOLantern: Parallel implementation for multiple SVDs using CUDA I have just reduced it to a kernel_0.cu: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include<iostream> #include<iomanip> #include<stdlib

Is there any way to dynamically allocate constant memory? CUDA

阅读更多关于 Is there any way to dynamically allocate constant memory? CUDA

问题 I'm confused about copying arrays to constant memory. According to programming guide there's at least one way to allocate constant memory and use it in order to store an array of values. And this is called static memory allocation: __constant__ float constData[256]; float data[256]; cudaMemcpyToSymbol(constData, data, sizeof(data)); cudaMemcpyFromSymbol(data, constData, sizeof(data)); According to programming guide again we can use: __device__ float* devPointer; float* ptr; cudaMalloc(&ptr,

How to perform a Real to Complex Transformation with cuFFT

阅读更多关于 How to perform a Real to Complex Transformation with cuFFT

问题 The following code has been adapted from here to apply to a single 1D transformation using cufftPlan1d. Ultimately I want to perform a batched in place R2C transformation, but code below perfroms a single transformation using a separate input and output array. How can adapt this code to perform a the transformation inplace, therefore reducing the amount of memory allocated on the device? Thanks Cuda 6.5 - Note: I'm running the code from a mexFunction in MATLAB 2015a Code: #include <stdlib.h>

how does nvidia-smi work?

阅读更多关于 how does nvidia-smi work?

问题 What is the internal operation that allows nvidia-smi fetch the hardware level details? The tool executes even when some process is already running on the GPU device and gets the utilization details, name and id of the process etc. Is it possible to develop such a tool at the user level? How is NVML related? 回答1: Nvidia-smi is a thin wrapper around NVML. You can code with NVML with help of SDK contained in Tesla Deployment Kit. Everything that can be done with nvidia-smi can be queried

how does nvidia-smi work?

阅读更多关于 how does nvidia-smi work?

can we get the on-time print-out during the kernel running?

阅读更多关于 can we get the on-time print-out during the kernel running?

问题 I realized that "cuPrintf" can be used in the kernel, but "cudaPrintfDisplay" can only be used in the CPU code. This seems to me that the "cuPrintf" can only be flushed to stdout after returning from kernel. My question is: can we get the on-time print-out during the kernel running? 回答1: As you have noticed, cuPrintf() (and printf() in compute capability >= 2.0), simply add strings to a buffer while the kernel is running, and the buffer is printed out after the kernel ends. I don't think

订阅 cuda