gpgpu | 易学教程

How to find haze extent of an image with OpenCV on CUDA? [closed]

阅读更多关于 How to find haze extent of an image with OpenCV on CUDA? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 years ago . I m trying to find the maximum and minimum of RGB values of an image. the flow in which i was planning to go is: load the image. after loading the image, create a 15x15 cell around the cell to be tested find the max of RGB of the test cell and store it in an array. then print the

Trying to mix in OpenCL with CUDA in NVIDIA's SDK template

阅读更多关于 Trying to mix in OpenCL with CUDA in NVIDIA's SDK template

问题 I have been having a tough time setting up an experiment where I allocate memory with CUDA on the device, take that pointer to memory on the device, use it in OpenCL, and return the results. I want to see if this is possible. I had a tough time getting a CUDA project to work so I just used Nvidia's template project in their SDK. In the makefile I added -lOpenCL to the libs section of the common.mk. Everything is fine when I do that, but when I add #include <CL/cl.h> to template.cu so I can

NVRTC and device functions

阅读更多关于 NVRTC and __device__ functions

问题 I am trying to optimize my simulator by leveraging run-time compilation. My code is pretty long and complex, but I identified a specific __device__ function whose performances can be strongly improved by removing all global memory accesses. Does CUDA allow the dynamic compilation and linking of a single __device__ function (not a __global__ ), in order to "override" an existing function? 回答1: I am pretty sure the really short answer is no. Although CUDA has dynamic/JIT device linker support,

How to deal with NaN or inf in OpenGL ES 2.0 shaders

阅读更多关于 How to deal with NaN or inf in OpenGL ES 2.0 shaders

问题 This is based on the question: Best way to detect NaN's in OpenGL shaders Standard GLSL defines isnan() and isinf() functions for detection. OpenGL ES 2.0 shading language doesn't. How can I deal with NaNs and Infs nevertheless? 回答1: You can check for NaN via a condition that will only be true for NaN: bool isNan(float val) { return (val <= 0.0 || 0.0 <= val) ? false : true; } isinf is a bit more difficult. There is no mechanism to convert the float into its integer representation and play

“Global Load Efficiency” over 100%

阅读更多关于 “Global Load Efficiency” over 100%

问题 I have a CUDA program in which threads of a block read elements of a long array in several iterations and memory accesses are almost fully coalesced. When I profile, Global Load Efficiency is over 100% (between 119% and 187% depending on the input). Description for Global Load Efficiency is " Ratio of global memory load throughput to required global memory load throughput. " Does it mean that I'm hitting L2 cache a lot and my memory accesses are benefiting from it? My GPU is GeForce GTX 780

Cumulative summation in CUDA

阅读更多关于 Cumulative summation in CUDA

问题 Can someone please point me in the right direction on how to do this type of calculation in parallel, or tell me what the general name of this method is? I don't think these will return the same result. C++ for (int i = 1; i < width; i++) x[i] = x[i] + x[i-1]; CUDA int i = blockIdx.x * blockDim.x + threadIdx.x if ((i > 0) && (i < (width))) X[i] = X[i] + X[i-1]; 回答1: This looks like a cumulative sum operation, in which the final value of x[i] is the sum of all values x[0]...x[i] in the

Running OpenCL on hardware from mixed vendors

阅读更多关于 Running OpenCL on hardware from mixed vendors

问题 I've been playing with the ATI OpenCL implementation in their Stream 2.0 beta. The OpenCL in the current beta only uses the CPU for now, the next version is supposed to support GPU kernels. I downloaded Stream because I have an ATI GPU in my work machine. I write software that would benefit hugely from gains by using the GPU. However this software runs on customer machines, I don't have the luxury (as many scientific computing environments have) to choose the exact hardware to develop for,

CUDA: How to pass multiple duplicated arguments to CUDA Kernel

阅读更多关于 CUDA: How to pass multiple duplicated arguments to CUDA Kernel

问题 I'm looking for an elegent way to pass multiple duplicated arguments in CUDA kernel, As we all know, each kernel argument is located on the stack of each CUDA thread, therefore, there might be duplication between arguments being passed by the Kernel to each thread, memory which is located on each stack. In order to minimize the number of duplicated arguments being passed, I'm looking for an elegant way doing so. In order to explain my concern: Let's say my code looks like this: kernelFunction

LLVM front end register class error OpenCL — GPU target

阅读更多关于 LLVM front end register class error OpenCL — GPU target

问题 I've recently been encountering this error when compiling OpenCL kernel files with my LLVM_IR pass: aoc: ../../../TargetRegisterInfo.cpp:89: const llvm::TargetRegisterClass* llvm::TargetRegisterInfo::getMinimalPhysRegClass(unsigned int, llvm::EVT) const: Assertion `BestRC && "Couldn't find the register class"' failed. I'm not sure what this means. What I've read from the documention doesn't make a lot of sense. Basically it means the backend doesn't know what type to place into the register?

OpenCl: Minimal configuration to work with AMD GPU

阅读更多关于 OpenCl: Minimal configuration to work with AMD GPU

问题 Suppose we have AMD GPU (for example Radeon HD 7970) and minimal linux system without X and etc. What should be installed and what should be launched and how it should be launched to have proper OpenCL environment? In best case it should be headless environment. Requirements to environment: GPU visible by OpenCL programs ( clinfo for example) It is possible to monitor temperature and set fan speed (for example using aticonfig ). P.S. Simple install Xserver, catalyst and run X :0 won't work