nvidia

Selective nvidia #pragma optionNV(unroll all)

扶醉桌前 提交于 2019-12-10 09:51:55
问题 I'm playing around with nvidia's unroll loops directive, but haven't seen a way to turn it on selectively. Lets say I have this... void testUnroll() { #pragma optionNV(unroll all) for (...) ... } void testNoUnroll() { for (...) ... } Here, I'm assuming both loops end up being unrolled. To stop this I think the solution will involve resetting the directive after the block I want affected, for example: #pragma optionNV(unroll all) for (...) ... #pragma optionNV(unroll default) //?? However I

What is the difference between the CUDA tookit and the CUDA sdk

跟風遠走 提交于 2019-12-10 04:10:09
问题 I am installing CUDA on Ubuntu 14.04 and have a Maxwell card (GTX 9** series) and I think I have installed everything properly with the toolkit as I can compile my samples. However, I read that in places that I should install the SDK (This appears to be talked about with the sdk 4). I am not sure if the toolkit and sdk are different? As I have a later 9 series card does that mean I have CUDA 6 running? Here is my nvcc version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2014

Dynamic Allocation of Constant memory in CUDA

大兔子大兔子 提交于 2019-12-10 03:56:07
问题 I'm trying to take advantage of the constant memory, but I'm having a hard time figuring out how to nest arrays. What I have is an array of data that has counts for internal data but those are different for each entry. So based around the following simplified code I have two problems. First I don't know how to allocate the data pointed to by the members of my data structure. Second, since I can't use cudaGetSymbolAddress for constant memory I'm not sure if I can just pass the global pointer

Detecting if the monitor is powered off

旧城冷巷雨未停 提交于 2019-12-10 02:58:52
问题 I have a kiosk type application and I need to be notified if the LCD TV is powered off so I can chastise someone. I'm running Ubuntu 10.10 with nVidia video cards and the nVidia drivers. The TVs are plugged in via HDMI. I've taken a look at nvidia-settings -q ConnectedDisplays and nvidia-settings -q EnabledDisplays , but both always report the monitor is connected. I'm guessing that this value is only set once when the monitor is first powered on? I've also looked at xrandr --properties and

NVidia CUDA toolkit 7.5.27 failing to install on OS X

浪子不回头ぞ 提交于 2019-12-10 00:47:31
问题 Downloading the CUDA toolkit DMG works, but the installer fails with a cryptic "package manifest parsing error" error after selecting packages. Running the installer from the command line using the binary inside fails in a similar manner. The log file at /var/log/cuda_installer.log basically says the same: Apr 28 18:16:10 CUDAMacOSXInstaller[58493] : Awoken from nib! Apr 28 18:16:10 CUDAMacOSXInstaller[58493] : Switched to local mode. Apr 28 18:16:24 CUDAMacOSXInstaller[58493] : Package

OpenCL crashes on call to clGetPlatformIDs

若如初见. 提交于 2019-12-09 18:27:37
问题 I am new to OpenCL. Working on a Core i5 machine with Intel(R) HD Graphics 4000, running Windows 7. I installed the newest Intel driver with support for OpenCL. GpuCapsViewer confirms I have OpenCL support setup. I Developed a simple HelloWorld program using Intel OpenCL SDK. I successfully compile the program but when run, it crashes upon call to clGetPlatformIDs() with a segmentation fault. This is my code: #include <iostream> #include <CL/opencl.h> int main() { std::cout << "Test OCL

Can't we use atomic operations for floating point variables in CUDA?

落花浮王杯 提交于 2019-12-09 17:35:05
问题 I have used atomicMax() to find the maximum value in the CUDA kernel: __global__ void global_max(float* values, float* gl_max) { int i=threadIdx.x + blockDim.x * blockIdx.x; float val=values[i]; atomicMax(gl_max, val); } It is throwing the following error: error: no instance of overloaded function "atomicMax" matches the argument list The argument types are: (float *, float) . 回答1: The short answer is no. As you can see from the atomic function documentation, only integer arguments are

CUDA C programming with 2 video cards

孤街醉人 提交于 2019-12-09 16:41:07
问题 I am very new to CUDA programming and was reading the 'CUDA C Programming Guide' provided by nvidia. (http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf) In the page 25, it has the following C code that does the matrix multiplication. Can you please tell me how can I make that code run on two devices? (if I have two nvida CUDA capable cards installed in my computer). Can you please show me with an example. // Matrices are stored in row-major

Error installing nvidia driver Ubuntu 14.04 [closed]

a 夏天 提交于 2019-12-09 14:14:31
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I'm having trouble installing driver for GTX980 on Ubuntu 14.04. I need to upgrade to CUDA7.5 and latest driver. I used both the .run installer and deb. installer and do the purging before the installation. Here is the log: Using built-in stream user interface -> Detected 8 CPUs online; setting concurrency level

what's the correct and most efficient way to use mapped(zero-copy) memory mechanism in Nvidia OpenCL environment?

僤鯓⒐⒋嵵緔 提交于 2019-12-09 14:07:12
问题 Nvidia has offered an example about how to profile bandwidth between Host and Device, you can find codes here: https://developer.nvidia.com/opencl (search "bandwidth"). The experiment is carried on in an Ubuntu 12.04 64-bits computer. I am inspecting pinned memory and mapped accessing mode, which can be tested by invoke: ./bandwidthtest --memory=pinned --access=mapped The core test loop on Host-to-Device bandwidth is at around line 736~748. I also list them here and add some comments and