nvidia

Cannot compile OpenCL application using 1.2 headers in 1.1 version

◇◆丶佛笑我妖孽 提交于 2019-11-30 09:24:29
I'm writing a small hello world OpenCL program using Khronos Group's cl.hpp for OpenCL 1.2 and nVidia's openCL libraries. The drivers and ICD I have support OpenCL 1.1. Since the nVidia side doesn't support 1.2 yet, I get some errors on functions required on OpenCL 1.2. On the other side, cl.hpp for OpenCL 1.2 has a flag, CL_VERSION_1_1 to be exact, to run the header in 1.1 mode, but it's not working. Anybody has similar experience or solution? Note: cl.hpp for version 1.1 works but, generates many warnings during compilation. This is why I'm trying to use 1.2 version. Unfortunately NVIDIA

Does AMD's OpenCL offer something similar to CUDA's GPUDirect?

点点圈 提交于 2019-11-30 07:06:20
NVIDIA offers GPUDirect to reduce memory transfer overheads. I'm wondering if there is a similar concept for AMD/ATI? Specifically: 1) Do AMD GPUs avoid the second memory transfer when interfacing with network cards, as described here . In case the graphic is lost at some point, here is a description of the impact of GPUDirect on getting data from a GPU on one machine to be transferred across a network interface: With GPUDirect, GPU memory goes to Host memory then straight to the network interface card. Without GPUDirect, GPU memory goes to Host memory in one address space, then the CPU has to

How to prevent two CUDA programs from interfering

穿精又带淫゛_ 提交于 2019-11-30 05:42:34
问题 I've noticed that if two users try to run CUDA programs at the same time, it tends to lock up either the card or the driver (or both?). We need to either reset the card or reboot the machine to restore normal behavior. Is there a way to get a lock on the GPU so other programs can't interfere while it's running? Edit OS is Ubuntu 11.10 running on a server. While there is no X Windows running, the card is used to display the text system console. There are multiple users. 回答1: If you are running

CUDA: What is the threads per multiprocessor and threads per block distinction? [duplicate]

∥☆過路亽.° 提交于 2019-11-30 05:27:04
This question already has an answer here: CUDA: How many concurrent threads in total? 3 answers We have a workstation with two Nvidia Quadro FX 5800 cards installed. Running the deviceQuery CUDA sample reveals that the maximum threads per multiprocessor (SM) is 1024, while the maximum threads per block is 512. Given that only one block can be executed on each SM at a time, why is max threads / processor double the max threads / block? How do we utilise the other 512 threads per SM? Device 1: "Quadro FX 5800" CUDA Driver Version / Runtime Version 5.0 / 5.0 CUDA Capability Major/Minor version

OpenMP 4.0 in GCC: offload to nVidia GPU

牧云@^-^@ 提交于 2019-11-30 03:44:12
TL;DR - Does GCC (trunk) already support OpenMP 4.0 offloading to nVidia GPU? If so, what am I doing wrong? (description below). I'm running Ubuntu 14.04.2 LTS . I have checked out the most recent GCC trunk (dated 25 Mar 2015). I have installed the CUDA 7.0 toolkit according to Getting Started on Ubuntu guide. CUDA samples run successfully, i.e. deviceQuery detects my GeForce GT 730. I have followed the instructions from https://gcc.gnu.org/wiki/Offloading as well as https://gcc.gnu.org/install/specific.html#nvptx-x-none I have installed nvptx-tools and nvptx-newlib ( configure , make , sudo

VAO and element array buffer state

杀马特。学长 韩版系。学妹 提交于 2019-11-30 00:09:37
I was recently writing some OpenGL 3.3 code with Vertex Array Objects (VAO) and tested it later on Intel graphics adapter where I found, to my disappointment, that element array buffer binding is evidently not part of VAO state, as calling: glBindVertexArray(my_vao); glDrawElements(GL_TRIANGLE_STRIP, count, GL_UNSIGNED_INTEGER, 0); had no effect, while: glBindVertexArray(my_vao); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, my_index_buffer); // ? glDrawElements(GL_TRIANGLE_STRIP, count, GL_UNSIGNED_INTEGER, 0); rendered the geometry. I thought it was a mere bug in Intel implementation of OpenGL

Fedora 19 using rpmfussion's NVIDIA driver: libGL error: failed to load driver: swrast

不想你离开。 提交于 2019-11-29 23:31:27
问题 When running an app that uses Qt 4.7 on my Fedora 19 box I am getting the following errors from the application: libGL: screen 0 does not appear to be DRI2 capable libGL: OpenDriver: trying /usr/lib64/dri/tls/swrast_dri.so libGL: OpenDriver: trying /usr/lib64/dri/swrast_dri.so libGL: Can't open configuration file /home/Matthew.Hoggan/.drirc: No such file or directory. libGL error: failed to load driver: swrast ERROR: Error failed to create progam. I do not see these errors in a stock X11

Running more than one CUDA applications on one GPU

核能气质少年 提交于 2019-11-29 20:43:15
CUDA document does not specific how many CUDA process can share one GPU. For example, if I launch more than one CUDA programs by the same user with only one GPU card installed in the system, what is the effect? Will it guarantee the correctness of execution? How does the GPU schedule tasks in this case? CUDA activity from independent host processes will normally create independent CUDA contexts , one for each process. Thus, the CUDA activity launched from separate host processes will take place in separate CUDA contexts, on the same device. CUDA activity in separate contexts will be serialized

Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI?

孤者浪人 提交于 2019-11-29 16:27:54
By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf 1.1. AT A GLANCE 1.1.1. MPS The Multi-Process Service (MPS) is an alternative, binary-compatible implementation of the CUDA Application Programming Interface (API). The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs , to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler-based) Tesla and Quadro GPUs. Hyper-Q allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the

clEnqueueNDRange blocking on Nvidia hardware? (Also Multi-GPU)

好久不见. 提交于 2019-11-29 16:18:38
On Nvidia GPUs, when I call clEnqueueNDRange , the program waits for it to finish before continuing. More precisely, I'm calling its equivalent C++ binding, CommandQueue::enqueueNDRange , but this shouldn't make a difference. This only happens on Nvidia hardware (3 Tesla M2090s) remotely; on our office workstations with AMD GPUs, the call is nonblocking and returns immediately. I don't have local Nvidia hardware to test on - we used to, and I remember similar behavior then, too, but it's a bit hazy. This makes spreading the work across multiple GPUs harder. I've tried starting a new thread for