nvidia | 易学教程

CUDA C programming with 2 video cards

阅读更多关于 CUDA C programming with 2 video cards

I am very new to CUDA programming and was reading the 'CUDA C Programming Guide' provided by nvidia. ( http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf ) In the page 25, it has the following C code that does the matrix multiplication. Can you please tell me how can I make that code run on two devices? (if I have two nvida CUDA capable cards installed in my computer). Can you please show me with an example. // Matrices are stored in row-major order: // M(row, col) = *(M.elements + row * M.stride + col) typedef struct { int width; int height; int

Invalid device symbol when copying to CUDA constant memory

阅读更多关于 Invalid device symbol when copying to CUDA constant memory

问题 I have several files for an app in image processing. As the number of rows and colums for an image does not change while doing some image processing algorithm I was trying to put those values in constant memory. My app looks like: Imageproc.cuh ... ... __constant__ int c_rows; __constant__ int c_cols; #ifdef __cplusplus extern "C" { #endif ... ... #ifdef __cplusplus } #endif Imageproc.cu ... ... int algorithm(float *a, const int rows, const int cols){ ... ... checkCudaError(cudaMemcpyToSymbol

Error installing nvidia driver Ubuntu 14.04 [closed]

阅读更多关于 Error installing nvidia driver Ubuntu 14.04 [closed]

I'm having trouble installing driver for GTX980 on Ubuntu 14.04. I need to upgrade to CUDA7.5 and latest driver. I used both the .run installer and deb. installer and do the purging before the installation. Here is the log: Using built-in stream user interface -> Detected 8 CPUs online; setting concurrency level to 8. -> License accepted by command line option. -> Installing NVIDIA driver version 352.39. -> There appears to already be a driver installed on your system (version: 352.39). As part of installing this driver (version: 352.39), the existing driver will be uninstalled. Are you sure

what's the correct and most efficient way to use mapped(zero-copy) memory mechanism in Nvidia OpenCL environment?

阅读更多关于 what's the correct and most efficient way to use mapped(zero-copy) memory mechanism in Nvidia OpenCL environment?

Nvidia has offered an example about how to profile bandwidth between Host and Device, you can find codes here: https://developer.nvidia.com/opencl (search "bandwidth"). The experiment is carried on in an Ubuntu 12.04 64-bits computer. I am inspecting pinned memory and mapped accessing mode, which can be tested by invoke: ./bandwidthtest --memory=pinned --access=mapped The core test loop on Host-to-Device bandwidth is at around line 736~748. I also list them here and add some comments and context code: //create a buffer cmPinnedData in host cmPinnedData = clCreateBuffer(cxGPUContext, CL_MEM

Nvidia和arm合作推进物联网智能

阅读更多关于 Nvidia和arm合作推进物联网智能

NVIDIA Deep Learning Accelerator IP to be Integrated into Arm Project Trillium Platform, Easing Building of Deep Learning IoT Chips GPU Technology Conference — NVIDIA and Arm today announced that they are partnering to bring deep learning inferencing to the billions of mobile, consumer electronics and Internet of Things devices that will enter the global marketplace. Under this partnership, NVIDIA and Arm will integrate the open-source NVIDIA Deep Learning Accelerator (NVDLA) architecture into Arm’s Project Trillium platform for machine learning. The collaboration will make it simple for IoT

Is it possible to run Java3D applications on Nvidia 3D Vision hardware?

阅读更多关于 Is it possible to run Java3D applications on Nvidia 3D Vision hardware?

问题 Is is possible to run a Java3D application on Nvidia 3D Vision hardware? I've got an existing Java3D application that can run in stereoscopic 3D. In the past, I've always run the application on Quadro cards using the OpenGL renderer and quad buffered stereo. I now have access to a laptop with the nVidia 3D Vision system (with a GeForce GTX 460M). From the documentation, it seems like it should be possible to run my application in stereo if I use the DirectX bindings and let the nVidia drivers

Median selection in CUDA kernel

阅读更多关于 Median selection in CUDA kernel

问题 I need to compute the median of an array of size p inside a CUDA kernel (in my case, p is small e.g. p = 10). I am using an O(p^2) algorithm for its simplicity, but at the cost of time performance. Is there a "function" to find the median efficiently that I can call inside a CUDA kernel? I know I could implement a selection algorithm, but I'm looking for a function and/or tested code. Thanks! 回答1: Here are a few hints: Use a better selection algorithm: QuickSelect is a faster version of

C# Performance Counter Help, Nvidia GPU

阅读更多关于 C# Performance Counter Help, Nvidia GPU

So I've been experimenting with the performance counter class in C# and have had great success probing the CPU counters and almost everything I can find in the windows performance monitor. HOWEVER, I cannot gain access to the "NVIDIA GPU" category... So for example, the following line of code is how it usually works. PerformanceCounter cpuCounter = new PerformanceCounter("Processor", "% Processor Time", "_Total"); That code works fine, but the GPU category that appeared in the performance monitor, just as the processor category did, is not accessible by C#. The following line of code attempts

Tensorflow: GPU Utilization is almost always at 0%

阅读更多关于 Tensorflow: GPU Utilization is almost always at 0%

问题 I'm using tensorflow with Titan-X GPUs and I've noticed that, when I run the CIFAR10 example, the Volatile GPU-utilization is pretty constant around 30%, whereas when I train my own model, the Volatile GPU-utilization is far from steady, it is almost always 0% and spikes at 80/90% before going back to 0%, over and over again. I thought that this behavior was due to the way I was feeding the data to the network (I was fetching the data after each step, which took some time). But after

OpenGL rendering in Windows XP with multiple video cards

阅读更多关于 OpenGL rendering in Windows XP with multiple video cards

I'm developing an OpenGL application for Windows XP. The target machine has 2 NVIDIA GeForce 9800GT video cards, which are needed because the application needs to have output 2 streams of analog video. The application itself has two OpenGL windows, one for each video card. Each video card is connected to one monitor. As for the code, it's based on a minimal OpenGL example . How can I know if the application is utilizing both video cards for rendering? At the moment, I don't care if the application only runs on Windows XP or only with NVIDIA video cards, I just need to know how the two are