gpgpu

Hash table implementation for GPU [closed]

自作多情 提交于 2019-12-03 22:06:42
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I am looking for a hash table implementation that I can use for CUDA coding. are there any good one's out there. Something like the Python dictionary . I will use strings as my keys 回答1: Alcantara et al have demonstrated a data-parallel algorithm for building hash tables on the GPU. I believe the implementation

How to manage same CUDA kernel call from multiple CPU threads?

扶醉桌前 提交于 2019-12-03 21:27:59
I have a cuda kernel which works fine when called from a single CPU threads. However when the same is called from multiple CPU threads (~100), most of the kernel seems not be executed at all as the results comes out to be all zeros.Can someone please guide me how to resolve this problem? In the current version of kernel I am using a cudadevicesynchronize() at the end of kernel call. Will adding a sync command before cudaMalloc() and kernel call be of any help in this case? There is another thing which need some clarification. i.e. If two CPU threads executes the same cudaMalloc() command, will

GPGPU programming with OpenGL ES 2.0

独自空忆成欢 提交于 2019-12-03 18:46:13
问题 I am trying to do some image processing on the GPU, e.g. median, blur, brightness, etc. The general idea is to do something like this framework from GPU Gems 1. I am able to write the GLSL fragment shader for processing the pixels as I've been trying out different things in an effect designer app. I am not sure however how I should do the other part of the task. That is, I'd like to be working on the image in image coords and then outputting the result to a texture. I am aware of the gl

How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

怎甘沉沦 提交于 2019-12-03 18:44:08
问题 I'm probably misusing OpenCV by using it as wrapper to the official OpenCL C++ bindings so that I can launch my own kernels. However, OpenCV does have classes like Program, ProgramSource, Kernel, Queue, etc. that seem to tell me that I can launch my own (even non-image-based) kernels with OpenCV. I am having trouble finding documentation out there for these classes, let alone examples. So, I took a stab at it so far: #include <fstream> #include <iostream> #include "opencv2/opencv.hpp"

OpenCL dynamic parallelism / GPU-spawned threads?

笑着哭i 提交于 2019-12-03 17:33:20
CUDA 5 has just been released and with it the ability to spawn GPU threads from within another GPU (main?) thread, minimising callouts between CPU and GPU that we've seen thus far. What plans are there to support GPU-spawned threads in the OpenCL arena? As I cannot afford to opt for a closed standard (my user base is "everygamer"), I need to know when OpenCL is ready for prime time in this regard. OpenCL Standard is usually the way back of CUDA (except for device partitioning feature) and I guess this feature will be added to OpenCL in a year. EDIT on Aug 8, 2013: This feature has been

General purpose compute with Vertex/Pixel shaders (Open GL / DirectX)

混江龙づ霸主 提交于 2019-12-03 15:22:18
I have a question regarding compute shaders. are compute shaders available in DX 9? would it be still possible to use a compute shader with a DX9 driver if there is no compute shader fragment on the GPU? ( SGX 545 does not have it, but SGX 6X generation is going to have it), as far as what IMG says. I would like to know If i can do some simple general purpose programming on SGXs GPUs with DirectX9 or OpenGL drivers. Also, is there anyway I can use OpenGL vertex shaders for GPGPU programming? Here is what I am thinking: I will load my Matrices/ data into the vertex buffer, and bind them to the

Create local array dynamic inside OpenCL kernel

天大地大妈咪最大 提交于 2019-12-03 15:05:50
I have a OpenCL kernel that needs to process a array as multiple arrays where each sub-array sum is saved in a local cache array. For example, imagine the fowling array: [[1, 2, 3, 4], [10, 30, 1, 23]] Each work-group gets a array (in the exemple we have 2 work-groups); Each work-item process two array indexes (for example multiply the value index the local_id), where the work-item result is saved in a work-group shared array. __kernel void test(__global int **values, __global int *result, const int array_size){ __local int cache[array_size]; // initialise if (get_local_id(0) == 0){ for (int i

What do work items execute when conditionals are used in GPU programming?

六眼飞鱼酱① 提交于 2019-12-03 14:02:50
问题 If you have work items executing in a wavefront and there is a conditional such as: if(x){ ... } else{ .... } What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true ). If there are no work-items for which x is false, then the rest of the conditional is skipped? What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths

How to debug OpenCL on Nvidia GPUs?

做~自己de王妃 提交于 2019-12-03 12:39:48
Is there any way to debug OpenCL kernels on an Nvidia GPU, i.e. set breakpoints and inspect variables? My understanding is that Nvidia's tool does not allow OpenCL debugging, and AMD's and Intel's only allow it on their own devices. gDEBugger might help you somewhat (never used it though), but other than that there isn't any tool that I know of that can set breakpoints or inspect variables inside a kernel. Perhaps try to save intermediate outputs from your kernel if it is a long kernel. Sorry I can't give you a magic solution, debugging OpenCL is just hard. 来源: https://stackoverflow.com

Running OpenCL on hardware from mixed vendors

早过忘川 提交于 2019-12-03 12:35:19
I've been playing with the ATI OpenCL implementation in their Stream 2.0 beta. The OpenCL in the current beta only uses the CPU for now, the next version is supposed to support GPU kernels. I downloaded Stream because I have an ATI GPU in my work machine. I write software that would benefit hugely from gains by using the GPU. However this software runs on customer machines, I don't have the luxury (as many scientific computing environments have) to choose the exact hardware to develop for, and optimize for that. So my question is, if I distribute the ATI OpenCL implementation with my