nvidia | 易学教程

CUDA atomicAdd for doubles definition error

阅读更多关于 CUDA atomicAdd for doubles definition error

问题 In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here. With the new CUDA 8 RC, I run into troubles when I try to compile my code which includes such a function. I guess this is due to the fact that with Pascal and Compute Capability 6.0, a native double version of atomicAdd has been added, but somehow that is not properly ignored for previous Compute Capabilities. The code below used to compile and run fine with previous CUDA

Is stereoscopy (3D stereo) making a come back?

阅读更多关于 Is stereoscopy (3D stereo) making a come back?

I'm working on a stereoscopy application in C++ and OpenGL (for medical image visualization). From what I understand, the technology was quite big news about 10 years ago but it seems to have died down since. Now, many companies seem to be investing in the technology... Including nVidia it would seem . Stereoscopy is also known as "3D Stereo", primarily by nVidia (I think). Does anyone see stereoscopy as a major technology in terms of how we visualize things? I'm talking in both a recreational and professional capacity. With nVidia's 3D kit you don't need to "make a stereoscopy application",

Why should I use the CUDA Driver API instead of CUDA Runtime API?

阅读更多关于 Why should I use the CUDA Driver API instead of CUDA Runtime API?

问题 Why should I use the CUDA Driver API, and in which cases I can't use CUDA Runtime API (which is more convenient than Driver API)? 回答1: The runtime API is an higher level of abstraction over the driver API and it's usually easier to use (the performance gap should be minimal). The driver API is a handle-based one and provides a higher degree of control. The runtime API, on the contrary, is easier to use (e.g. you can use the kernel<<<>>> launch syntax). That " higher degree of control " means

how do I update cuDNN to a newer version?

阅读更多关于 how do I update cuDNN to a newer version?

问题 the cuDNN installation manual says ALL PLATFORMS Extract the cuDNN archive to a directory of your choice, referred to below as . Then follow the platform-specific instructions as follows. LINUX cd export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH Add to your build and link process by adding -I to your compile line and -L -lcudnn to your link line. It seems that it simply adds pwd to LD_LIBRARY_PATH , so I guess just replacing the files in pwd will do the update. But it seems not that simple as

How could we generate random numbers in CUDA C with different seed on each run?

阅读更多关于 How could we generate random numbers in CUDA C with different seed on each run?

I am working on a stochastic process and I wanted to generate different series if random numbers in CUDA kernel each time I run the program. This similar to what we does in C++ by declaring seed = time(null) followed by srand(seed) and rand( ) I can pass seeds from host to device via the kernel but the problem in doing this is I would have to pass an entire array of seeds into the kernel for each thread to have a different random seed each time. Is there a way I could generate random seed / process if / machine time or something like than within the kernel and pass it as a seed? JackOLantern

nvEncRegisterResource() fails with -23

阅读更多关于 nvEncRegisterResource() fails with -23

I've hit a complete brick wall in my attempt to use NVEnc to stream OpenGL frames as H264. I've been at this particular issue for close to 8 hours without any progress. The problem is the call to nvEncRegisterResource() , which invariably fails with code -23 (enum value NV_ENC_ERR_RESOURCE_REGISTER_FAILED, documented as "failed to register the resource" - thanks NVidia). I'm trying to follow a procedure outlined in this document from the University of Oslo (page 54, "OpenGL interop"), so I know for a fact that this is supposed to work, though unfortunately said document does not provide the

L2 cache in NVIDIA Fermi

阅读更多关于 L2 cache in NVIDIA Fermi

When looking at the name of the performance counters in NVIDIA Fermi architecture (the file Compute_profiler.txt in the doc folder of cuda), I noticed that for L2 cache misses, there are two performance counters, l2_subp0_read_sector_misses and l2_subp1_read_sector_misses. They said that these are for two slices of L2. Why do they have two slices of L2? Is there any relation with the Streaming Multi-processor architecture? What would be the effect of this division to the performance? Thanks I don't think there is any direct relation with the streaming multiprocessor. I just think that slice is

OpenGL 3: glBindVertexArray invalidates GL_ELEMENT_ARRAY_BUFFER

阅读更多关于 OpenGL 3: glBindVertexArray invalidates GL_ELEMENT_ARRAY_BUFFER

问题 I was certain that if you bind a buffer via glBindBuffer() , you can safely assume that it stays bound, until the target is rebound through another call to glBindBuffer() . I was therefore quite surprised when I discovered that calling glBindVertexArray() sets the buffer bound to the GL_ELEMENT_ARRAY target to 0. Here's the minimal C++ sample code: GLuint buff; glGenBuffers(1, &buff); std::cout << "Buffer is " << buff << "\n"; glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, buff); GLuint vao;

forceinline effect at CUDA C device functions

阅读更多关于 __forceinline__ effect at CUDA C __device__ functions

问题 There is a lot of advice on when to use inline functions and when to avoid it in regular C coding. What is the effect of __forceinline__ on CUDA C __device__ functions? Where should they be used and where be avoided? 回答1: Normally the nvcc device code compiler will make it's own decisions about when to inline a particular __device__ function and generally speaking, you probably don't need to worry about overriding that with the __forceinline__ decorator/directive. cc 1.x devices don't have

Ubuntu kworker thread consumes 100% CPU [closed]

阅读更多关于 Ubuntu kworker thread consumes 100% CPU [closed]

I had a question and was unable to find the answer (easily). On my Ubuntu installation, a kworker thread was consuming 100% CPU, which coused my computer to be very slow or crash at times. aMaia If you run the command: grep . -r /sys/firmware/acpi/interrupts/ and check for any high value like: /sys/firmware/acpi/interrupts/sci: 264 /sys/firmware/acpi/interrupts/error: 0 /sys/firmware/acpi/interrupts/gpe00: 264 enabled /sys/firmware/acpi/interrupts/gpe01: 0 invalid ... /sys/firmware/acpi/interrupts/gpe1F: 0 invalid /sys/firmware/acpi/interrupts/sci_not: 0 /sys/firmware/acpi/interrupts/ff