nvidia | 易学教程

How can I make tensorflow run on a GPU with capability 2.0?

阅读更多关于 How can I make tensorflow run on a GPU with capability 2.0?

I've successfully installed tensorflow (GPU) on Linux Ubuntu 16.04 and made some small changes in order to make it work with the new Ubuntu LTS release. However, I thought (who knows why) that my GPU met the minimum requirement of a compute capability greater than 3.5. That was not the case since my GeForce 820M has just 2.1. Is there a way of making tensorflow GPU version working with my GPU? I am asking this question since apparently there was no way of making tensorflow GPU version working on Ubuntu 16.04 but by searching the internet I found out that was not the case and indeed I made it

Error -1001 in clGetPlatformIDs Call !

阅读更多关于 Error -1001 in clGetPlatformIDs Call !

问题 I am trying to start working with OpenCL. I have two NVidia graphics card, I installed "developer driver" as well as SDK from NVidia website. I compiled the demos but when I run ./oclDeviceQuery I see: OpenCL SW Info: Error -1001 in clGetPlatformIDs Call !!! How can I fix it? Does it mean my nvidia cards cannot be detected? I am running Ubuntu 10.10 and X server works properly with nvidia driver. I am pretty sure the problem is not related to file permissions as it doesn't work with sudo

nvidia-smi Volatile GPU-Utilization explanation?

阅读更多关于 nvidia-smi Volatile GPU-Utilization explanation?

I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number of used SMs over total SMs, or the occupancy, or something else? +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M

128 bit integer on cuda?

阅读更多关于 128 bit integer on cuda?

I just managed to install my cuda SDK under Linux Ubuntu 10.04. My graphic card is an NVIDIA geForce GT 425M, and I'd like to use it for some heavy computational problem. What I wonder is: is there any way to use some unsigned 128 bit int var? When using gcc to run my program on the CPU, I was using the __uint128_t type, but using it with cuda doesn't seem to work. Is there anything I can do to have 128 bit integers on cuda? Thank you very much Matteo Monti Msoft Programming For best performance, one would want to map the 128-bit type on top of a suitable CUDA vector type, such as uint4, and

Using constants with CUDA

阅读更多关于 Using constants with CUDA

问题 Which is the best way of using constants in CUDA? One way is to define constants in constant memory, like: // CUDA global constants __constant__ int M; int main(void) { ... cudaMemcpyToSymbol("M", &M, sizeof(M)); ... } An alterative way would be to use the C preprocessor: #define M = ... I would think defining constants with the C preprocessor is much faster. Which are then the benefits of using the constant memory on a CUDA device? 回答1: constants that are known at compile time should be

Cuda kernel returning vectors

阅读更多关于 Cuda kernel returning vectors

I have a list of words, my goal is to match each word in a very very long phrase. I'm having no problem in matching each word, my only problem is to return a vector of structures containing informations about each match. In code: typedef struct { int A, B, C; } Match; __global__ void Find(veryLongPhrase * _phrase, Words * _word_list, vector<Match> * _matches) { int a, b, c; [...] //Parallel search for each word in the phrase if(match) //When an occurrence is found { _matches.push_back(new Match{ A = a, B = b, C = c }); //Here comes the unknown, what should I do here??? } } main() { [...]

Calculation on GPU leads to driver error “stopped responding”

阅读更多关于 Calculation on GPU leads to driver error “stopped responding”

问题 I have this little nonsense script here which I am executing in MATLAB R2013b: clear all; n = 2000; times = 50; i = 0; tCPU = tic; disp 'CPU::' A = rand(n, n); B = rand(n, n); disp '::Go' for i = 0:times CPU = A * B; end tCPU = toc(tCPU); tGPU = tic; disp 'GPU::' A = gpuArray(A); B = gpuArray(B); disp '::Go' for i = 0:times GPU = A * B ; end tGPU = toc(tGPU); fprintf('On CPU: %.2f sec\nOn GPU: %.2f sec\n', tCPU, tGPU); Unfortunately after execution I receive a message from Windows saying: "

How can I get number of Cores in cuda device?

阅读更多关于 How can I get number of Cores in cuda device?

问题 I am looking for a function that count number of core of my cuda device. I know each microprocessor have specific cores, and my cuda device has 2 microprocessors. I searched a lot to find a property function that count number of cores per microprocessor but I couldn't. I use the code below but I still need number of cores? cuda 7.0 program language C visual studio 2013 Code: void printDevProp(cudaDeviceProp devProp) { printf("%s\n", devProp.name); printf("Major revision number: %d\n", devProp

Streaming multiprocessors, Blocks and Threads (CUDA)

阅读更多关于 Streaming multiprocessors, Blocks and Threads (CUDA)

问题 What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads? What gets mapped to what and what is parallelized and how? and what is more efficient, maximize the number of blocks or the number of threads? My current understanding is that there are 8 cuda cores per multiprocessor. and that every cuda core will be able to execute one cuda block at a time. and all the threads in that block are executed serially in that particular core. Is this

CUDA5 Examples: Has anyone translated some cutil definitions to CUDA5?

阅读更多关于 CUDA5 Examples: Has anyone translated some cutil definitions to CUDA5?

问题 Has anyone started to work with the CUDA5 SDK? I have an old project that uses some cutil functions, but they've been abandoned in the new one. The solution was that most functions can be translated from cutil*/cut* to a similar named sdk* equivalent from the helper*.h headers... As an example: cutStartTimer becomes sdkCreateTimer Just that simple... 回答1: Has anyone started to work with the CUDA5 SDK? Probably. Has anyone translated some cutil definitions to CUDA5? Maybe. But why not just use