gpgpu | 易学教程

enqueueWriteImage fail on GPU

阅读更多关于 enqueueWriteImage fail on GPU

问题 I am developing some kernels which works with image buffers. The problem is that when I create my Image2D by directly copying the data of the image, everything works well. If I try to enqueue a write to my image buffer, it won't works for my GPU. Here is a basic kernel : __kernel void myKernel(__read_only image2d_t in, __write_only image2d_t out) { const int x = get_global_id(0); const int y = get_global_id(1); const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_CLAMP_TO_EDGE | CLK

Cuda-gdb not stopping at breakpoints inside kernel

阅读更多关于 Cuda-gdb not stopping at breakpoints inside kernel

问题 Cuda-gdb was obeying all the breakpoints I would set, before adding '-arch sm_20' flag while compiling. I had to add this to avoid error being thrown : 'atomicAdd is undefined' (as pointed here). Here is my current statement to compile the code: nvcc -g -G --maxrregcount=32 Main.cu -o SW_exe (..including header files...) -arch sm_20 and when I set a breakpoint inside kernel, cuda-gdb stops once at the last line of the kernel, and then the program continues. (cuda-gdb) b SW_kernel_1.cu:49

Flickering GPGPU Particles Three.js

阅读更多关于 Flickering GPGPU Particles Three.js

问题 I've created a particle simulation (thing) with Three.js. I'm super happy with the simulation part of it, but for some reason I get a lot of screen flickering when I run it. The effect is more apparent on some machines / graphics cards, but I've noticed flickering with all of them. Here is the demo. Here is the source. Things I've tried: Narrowing the visible range on the camera Removing transparency from the PointCloud material Make each "simulator" have it's own camera, scene and mesh (I

Installed beignet to use OpenCL on Intel, but OpenCL programs only work when run as root

阅读更多关于 Installed beignet to use OpenCL on Intel, but OpenCL programs only work when run as root

问题 I have an Intel HD graphics 4000 3rd Gen Processor, and my OS is Linux Mint 17.1 64 bit. I installed beignet to be able to use OpenCL and thus run programs on the GPU. I had been having lots of problems using the pyOpenCL bindings, so I just decided to uninstall my current beignet version and install the latest one (You can see the previous question I asked and answered myself about it here). Upgrading beignet worked and I can now run OpenCL code on my GPU through python and C/C++ bindings.

Fast Fourier transforms on GPU on iOS

阅读更多关于 Fast Fourier transforms on GPU on iOS

问题 I am implementing compute intensive applications for iOS (i.e., iPhone or iPad) that heavily use fast Fourier transforms (and some signal processing operations such as interpolations and resampling). What are the best libraries and API that allows for running FFTs on iOS? I have briefly looked into Apple Metal as well as Apple vDSP. I wasn't sure that vDSP utilizes GPUs although it seems to be highly parallelized and utilizes SIMD. Metal seems to allow to access GPU for compute intensive apps

In OpenCL, what does mem_fence() do, as opposed to barrier()?

阅读更多关于 In OpenCL, what does mem_fence() do, as opposed to barrier()?

问题 Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence() : Orders loads and stores of a work-item executing a kernel. (so it applies to a single work item). But, at the same time, in section 3.3.1, it says that: Within a work-item memory has load / store consistency. so within a work item the memory is consistent. So what kind of thing is mem_fence() useful for? It doesn't work across items,

Sort 2D array in Cuda with Thrust

阅读更多关于 Sort 2D array in Cuda with Thrust

问题 I have a 2d array and I want to sort it by row meaning that if the array is 3 2 2 3 2 2 3 3 3 3 3 3 2 2 2 2 3 3 2 2 3 2 2 3 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 2 3 2 2 3 3 2 3 3 3 2 2 2 2 3 3 3 3 3 2 2 3 2 2 3 3 2 3 3 3 2 3 2 2 3 3 3 3 I want to take the array 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 3 2 2 2 3 2 2 3 2 2 3 3 2 3 3 2 2 3 2 2 3 3 3 2 3 2 2 3 2 2 3 3 3 3 3 3 2 2 2 2 3 3 2 2 3 3 2 2 2 2 3 3 3 3 3 3 2 3 2 2 3 3 2 3 3 3 2 3 2 2 3 3

Does GPGPU programming only allow the execution of SIMD instructions?

阅读更多关于 Does GPGPU programming only allow the execution of SIMD instructions?

问题 Does GPGPU programming only allow the execution of SIMD instructions? If so then it must be a tedious task to re write an algorithm that has been designed to run on a general CPU to run on a GPU? Also is there a pattern in algorithms that can be converted to SIMD architecture? 回答1: Well, it's not quite exact that GPGPU only supports SIMD execution. Many GPUs have some non-SIMD components. But, overall, to take full advantage of a GPU you need to be running SIMD code. However, you are NOT

Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

阅读更多关于 Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb 5.2.3. Multiprocessor Level ... 8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as

What is the best way to programmatically choose the best GPU in OpenCL?

阅读更多关于 What is the best way to programmatically choose the best GPU in OpenCL?

问题 On my laptop I have two graphic cards - Intel Iris and Nvidia GeForce GT 750M. I am trying to do a simple vector add using OpenCL . I know, that Nvidia card is much faster and can do the job better. In principle, I can put an if statement in the code that will look for NVIDIA in the VENDOR attribute. But I'd like to have something elegant. What is the best way to choose a better (faster) GPU programmatically in OpenCL C/C++ ? 回答1: I developed a real-time ray tracer (not just a ray caster)