nvidia | 易学教程

128 bit integer on cuda?

阅读更多关于 128 bit integer on cuda?

问题 I just managed to install my cuda SDK under Linux Ubuntu 10.04. My graphic card is an NVIDIA geForce GT 425M, and I\'d like to use it for some heavy computational problem. What I wonder is: is there any way to use some unsigned 128 bit int var? When using gcc to run my program on the CPU, I was using the __uint128_t type, but using it with cuda doesn\'t seem to work. Is there anything I can do to have 128 bit integers on cuda? 回答1: For best performance, one would want to map the 128-bit type

What is a bank conflict? (Doing Cuda/OpenCL programming)

阅读更多关于 What is a bank conflict? (Doing Cuda/OpenCL programming)

问题 I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. Can anybody help me understand it? I have no preference if the help is in the context of CUDA/OpenCL or just bank conflicts in general in computer science. 回答1: For nvidia (and amd for that matter) gpus the local memory is divided into memorybanks. Each bank can only address one dataset

How can I make tensorflow run on a GPU with capability 2.x?

阅读更多关于 How can I make tensorflow run on a GPU with capability 2.x?

问题 I\'ve successfully installed tensorflow (GPU) on Linux Ubuntu 16.04 and made some small changes in order to make it work with the new Ubuntu LTS release. However, I thought (who knows why) that my GPU met the minimum requirement of a compute capability greater than 3.5. That was not the case since my GeForce 820M has just 2.1. Is there a way of making tensorflow GPU version working with my GPU? I am asking this question since apparently there was no way of making tensorflow GPU version

What can I do against 'CUDA driver version is insufficient for CUDA runtime version'?

阅读更多关于 What can I do against 'CUDA driver version is insufficient for CUDA runtime version'?

问题 When I go to /usr/local/cuda/samples/1_Utilities/deviceQuery and execute moose@pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make clean rm -f deviceQuery deviceQuery.o rm -rf ../../bin/x86_64/linux/release/deviceQuery moose@pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make \"/usr/local/cuda-7.0\"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch

Horrible redraw performance of the DataGridView on one of my two screens

阅读更多关于 Horrible redraw performance of the DataGridView on one of my two screens

问题 I\'ve actually solved this, but I\'m posting it for posterity. I ran into a very odd issue with the DataGridView on my dual-monitor system. The issue manifests itself as an EXTREMELY slow repaint of the control ( like 30 seconds for a full repaint ), but only when it is on one of my screens. When on the other, the repaint speed is fine. I have an Nvidia 8800 GT with the latest non-beta drivers (175. something). Is it a driver bug? I\'ll leave that up in the air, since I have to live with this

How do CUDA blocks/warps/threads map onto CUDA cores?

阅读更多关于 How do CUDA blocks/warps/threads map onto CUDA cores?

问题 I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread. I am studying the architecture from a didactic point of view (university project), so reaching peak performance is not my concern. First of all, I would like to understand if I got these facts straight: The programmer writes a kernel, and organize its execution in a grid of thread blocks. Each block is assigned to a Streaming Multiprocessor (SM). Once assigned it cannot migrate to

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) [closed]

阅读更多关于 Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) [closed]

问题 How are threads organized to be executed by a GPU? 回答1: Hardware If a GPU device has, for example, 4 multiprocessing units, and they can run 768 threads each: then at a given moment no more than 4*768 threads will be really running in parallel (if you planned more threads, they will be waiting their turn). Software threads are organized in blocks. A block is executed by a multiprocessing unit. The threads of a block can be indentified (indexed) using 1Dimension(x), 2Dimensions (x,y) or 3Dim

How to measure the inner kernel time in NVIDIA CUDA?

阅读更多关于 How to measure the inner kernel time in NVIDIA CUDA?

问题 I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA? e.g. __global__ void kernelSample() { some code here get start time some code here get stop time some code here } 回答1: Try this, it measures time between 2 events in milliseconds. cudaEvent_t start, stop; float elapsedTime; cudaEventCreate(&start); cudaEventRecord(start,0); //Do kernel activity here cudaEventCreate(&stop); cudaEventRecord(stop,0); cudaEventSynchronize(stop); cudaEventElapsedTime(&elapsedTime,

How is CUDA memory managed?

阅读更多关于 How is CUDA memory managed?

问题 When I run my CUDA program which allocates only a small amount of global memory (below 20 M), I got a \"out of memory\" error. (From other people\'s posts, I think the problem is related to memory fragmentation) I try to understand this problem, and realize I have a couple of questions related to CUDA memory management. Is there a virtual memory concept in CUDA? If only one kernel is allowed to run on CUDA simultaneously, after its termination, will all of the memory it used or allocated

Swing rendering appears broken in JDK 1.8, correct in JDK 1.7

阅读更多关于 Swing rendering appears broken in JDK 1.8, correct in JDK 1.7

问题 I have installed IntelliJ IDEA (13.1.1 #IC-135.480) and JDK 1.8.0 (x64) and I generated some GUI with the GUI Form designer. Then I ran the code and realized that something is not alright. Here is a Screenshot of my GUI: The rendering of the Font seems to be not OK. Additionally the Button looses its Text, when I move my mouse over it. So I installed JDK 1.7.0_40 (x64), recompiled the Project and ran it again. The following Form appears, when i use JDK 1.7: The Rendering seems to be OK and