nsight | 易学教程

Member “has already been declared” error with CUDA and Eigen

阅读更多关于 Member “has already been declared” error with CUDA and Eigen

问题 I'm just a beginner with CUDA and Nsight and want to utilize great GPU performance with linear algebra operations (e.g. CUBLAS). I've got a lots of custom code written with the help of Eigen and there are lots of matrix multiplication operations, so I wanted to have my code unchanged, just do those operations on GPU. I've created a sample project with Visual Studio Nsight and it worked fine, but when I add #include <Eigen/Dense> line to that project, I've got following errors 1>------ Build

Why am I failing to overlap data transfers and computation with GTX 480 and CUDA 5?

阅读更多关于 Why am I failing to overlap data transfers and computation with GTX 480 and CUDA 5?

问题 I have tried to overlap kernel executions with memcpyasync but it doesn't work. I follow all recommendations in programming guide, using pinned memory, different streams, etc. I see kernel execution do overlap but it doesn't with mem transfers. I know my card has only one copy engine and one execution engine, but execution and tranfers should overlap, right? It seems the "copy engine" and "execution engine" always enforce the order I call the functions. Work consists on 4 streams performing

NVIDIA Parallel Nsight Vs Visual Profiler

阅读更多关于 NVIDIA Parallel Nsight Vs Visual Profiler

问题 I am working with CUDA on the windows platform. On the windows platform we have access to both Parallel Nsight and Visual Profiler. Both are pretty good but then they have almost similar features for profiling and tracing. Can someone say me how are they both different and which one is better for the windows platform ?? I will basically be needing a tool for profiling. 回答1: Nsight Visual Studio Edition 2.2 offers the following advantages over the Visual Profiler: OVERALL Integration into

CUDA Parallel NSight Debugging host and device simultaneously

阅读更多关于 CUDA Parallel NSight Debugging host and device simultaneously

Does anyone know if its possible to Debug CUDA using parallel NSight on a remote machine? I am able to step into CUDA code but not my host code. It says CUDA has the capability to generate host debug information so debugging remotely and locally should be possible. My card is a 580 GTX. //device code <-- able to debug device code //host code <---- when device code returns, should be able to debug host code Thanks! Simultaneous GPU/CPU debugging from a single IDE instance is unfortunately not possible with the current releases of Nsight and Visual Studio. As a workaround, you can start GPU

nsight eclipse remote debugging timed out error

阅读更多关于 nsight eclipse remote debugging timed out error

I have a Server running CentOS 6.0 and I'm trying to use it as a remote host for cuda debugging. In order to do this, I installed cuda-toolkit 5.5 both on the server and my notebook, which is running ubuntu 12.10 OS. I configured the two machines as the NVIDIA-cuda-instruction told me, yet when I started the Nsight eclipse edition and tried to remote debug my cuda applications, I ran into error, whcih says: Failed to execute MI command: -target-select remote 192.168.2.105:2345 Error message from debugger back end: 192.168.2.105:2345: Connection timed out I googled this error, someones says it

can't enter into global function using cuda

阅读更多关于 can't enter into __global__ function using cuda

I have written a code on Nsight that compiles and can be executed but the first launch can't be completed. The strange thing is that when I run it in debug mode, it works perfectly but it is too slow. Here is the part of the code before entering the function that access the GPU (where i think there is an error I can't find) : void parallelAction (int * dataReturned, char * data, unsigned char * descBase, int range, int cardBase, int streamIdx) { size_t inputBytes = range*128*sizeof(unsigned char); size_t baseBytes = cardBase*128*sizeof(unsigned char); size_t outputBytes = range*sizeof(int);

Nsight remote debugger settings

阅读更多关于 Nsight remote debugger settings

I am trying to setup a remote Nsight v2.2 debugger for GPU debugging only(no CUDA). I have followed this Nvidia pdf for setting up the remote target machine and the development machine. All are up and running, but the communication between the two is not proper. I am getting errors like MSVSMON.exe not running on remote machine etc. I am not sure about the exact settings required. What should be the, VS2010 project settings Nsight Monitor(remote machine) settings Nsight Settings in VS2010 and development machine. what kind of activity are you trying to do (CUDA, API or shader DX debugging)? If

False dependency issue for the Fermi architecture

阅读更多关于 False dependency issue for the Fermi architecture

I am trying to achieve " 3 -way overlapping" using 3 streams as in the examples in CUDA streams and concurrency webinar . But I couldn't achieve it. I have Geforce GT 550M (Fermi Architecture with one copy engine) and I am using Windows 7 (64 bit). Here is the code that I have written. #include <iostream> #include "cuda_runtime.h" #include "device_launch_parameters.h" // includes, project #include "helper_cuda.h" #include "helper_functions.h" // helper utility functions #include <stdio.h> using namespace std; #define DATA_SIZE 6000000 #define NUM_THREADS 32 #define NUM_BLOCKS 16 #define NUM

How to debug cuda thrust functions in visual studio 2010 with parallel nsight

阅读更多关于 How to debug cuda thrust functions in visual studio 2010 with parallel nsight

I am using visual studio 2010, parallel nsight 2.2 and cuda 4.2 for learning. My system is Windows 8 pro x64. I opened the radix sort project which included by cuda computing SDK in VS, and compiled it with no error. The sort code uses thrust library: if(keysOnly) thrust::sort(d_keys.begin(), d_keys.end()); else thrust::sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin()); I want to know how thrust dispatch the sort function to cuda kernels, so I tried to add breakpoints in front of lines above and compiled the project in debug mode. But when I use parallel nsight for cuda debugging,

How to debug cuda thrust functions in visual studio 2010 with parallel nsight

阅读更多关于 How to debug cuda thrust functions in visual studio 2010 with parallel nsight

问题 I am using visual studio 2010, parallel nsight 2.2 and cuda 4.2 for learning. My system is Windows 8 pro x64. I opened the radix sort project which included by cuda computing SDK in VS, and compiled it with no error. The sort code uses thrust library: if(keysOnly) thrust::sort(d_keys.begin(), d_keys.end()); else thrust::sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin()); I want to know how thrust dispatch the sort function to cuda kernels, so I tried to add breakpoints in front of