opencl | 易学教程

OpenCL - How to I query for a device's SIMD width?

阅读更多关于 OpenCL - How to I query for a device's SIMD width?

问题 In CUDA, there is a concept of a warp , which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on the market. In ATI cards, there is a similar concept, but the terminology in this context is wavefront . After some hunting around, I found out that the ATI card I have has a wavefront size of 64. My question is, what can I do to query for this SIMD

Is there a limit to OpenCL local memory?

阅读更多关于 Is there a limit to OpenCL local memory?

Today I added four more __local variables to my kernel to dump intermediate results in. But just adding the four more variables to the kernel's signature and adding the corresponding Kernel arguments renders all output of the kernel to "0"s. None of the cl functions returns an error code. I further tried only to add one of the two smaller variables. If I add only one of them, it works, but if I add both of them, it breaks down. So could this behavior of OpenCL mean, that I allocated to much __local memory? How do I find out, how much __local memory is usable by me? Kyle Lutz The amount of

What is the context switching mechanism in GPU?

阅读更多关于 What is the context switching mechanism in GPU?

问题 As I know, GPUs switch between warps to hide the memory latency. But I wonder in which condition, a warp will be switched out? For example, if a warp perform a load, and the data is there in the cache already. So is the warp switched out or continue the next computation? What happens if there are two consecutive adds? Thanks 回答1: First of all, once a thread block is launched on a multiprocessor (SM), all of its warps are resident until they all exit the kernel. Thus a block is not launched

When will OpenCL 1.2 for NVIDIA hardware be available?

阅读更多关于 When will OpenCL 1.2 for NVIDIA hardware be available?

I would have asked this question on the NVIDIA developer forum but since it's still down maybe someone here can tell me something. Does anybody know if there is already OpenCL 1.2 support in NVIDIAs driver? If not, is it coming soon? I don't have a GeForce 600 series card to check myself. According to Wikipedia there are already some cards that could support it though. It somewhat seems like NVIDIA does not mention OpenCL a whole lot anymore and just focuses on CUDA C/C++ (see StreamComputing.eu ). I guess it makes sense to them but I would like to see some more OpenCL love. Thanks James

How to use C++ templates in OpenCL kernels?

阅读更多关于 How to use C++ templates in OpenCL kernels?

问题 I'm a novice in OpenCL. I have an algorithm which uses templates. It worked well with OpenMP parallelization but now the amount of data has grown and the only way to process it is to rewrite it to use OpenCL. I can easily use MPI to build it for cluster but Tesla-like GPU is much cheaper than cluster :) Is there any way to use C++ templates in OpenCL kernel? Is it possible to somehow expand templates by C++ compiler or some tool and after that use so changed kernel function? EDIT. The idea of

OpenCL Floating point precision

阅读更多关于 OpenCL Floating point precision

I found a problem with host - client float standard in OpenCL. The problem was that the floating points calculated by Opencl is not in the same floating point limits as my visual studio 2010 compiler, when compiling in x86. However when compiling in x64 they are in the same limit. I know it has to be something with, http://www.viva64.com/en/b/0074/ The source I used during testing was: http://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism When i ran the program in x86 it would give me 202 numbers that were equal, when the kernel and the C++ program took square of

Can I run Cuda or opencl on intel iris?

阅读更多关于 Can I run Cuda or opencl on intel iris?

I have a Macbook pro mid 2014 with intel iris and intel core i5 processor 16GB of RAM. I am planing to learn some ray-traced 3D. But, I am not sure, if my laptop can render fast without any nvidia's hardware. So, I would appreciate it, if someone can tell me if I can use Cuda if not, then could you please teach me in a very easy way how to enable OpenCL in after affects. I am looking for any tutorial for beginners to learn how to create or build OpenCL? Cuda works only on nvidia hardware but there may be some libraries converting it to run on cpu cores(not igpu). AMD is working on "hipify"ing

Cannot compile OpenCL application using 1.2 headers in 1.1 version

阅读更多关于 Cannot compile OpenCL application using 1.2 headers in 1.1 version

I'm writing a small hello world OpenCL program using Khronos Group's cl.hpp for OpenCL 1.2 and nVidia's openCL libraries. The drivers and ICD I have support OpenCL 1.1. Since the nVidia side doesn't support 1.2 yet, I get some errors on functions required on OpenCL 1.2. On the other side, cl.hpp for OpenCL 1.2 has a flag, CL_VERSION_1_1 to be exact, to run the header in 1.1 mode, but it's not working. Anybody has similar experience or solution? Note: cl.hpp for version 1.1 works but, generates many warnings during compilation. This is why I'm trying to use 1.2 version. Unfortunately NVIDIA

Compiling an OpenCL program using a CL/cl.h file

阅读更多关于 Compiling an OpenCL program using a CL/cl.h file

I have sample "Hello, World!" code from the net and I want to run it on the GPU on my university's server. When I type "gcc main.c," it responds with: CL/cl.h: No such file or directory What should I do? How can I have this header file? Make sure you have the appropriate toolkit installed. This depends on what you intend running your code on. If you have an NVidia card then you need to download and install the CUDA-toolkit which also contains the necessary binaries and libraries for opencl. Are you running Linux? If you believe you already have OpenCL installed it could be that it is found at

OpenCL - How to I query for a device's SIMD width?

阅读更多关于 OpenCL - How to I query for a device's SIMD width?

In CUDA, there is a concept of a warp , which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on the market. In ATI cards, there is a similar concept, but the terminology in this context is wavefront . After some hunting around, I found out that the ATI card I have has a wavefront size of 64. My question is, what can I do to query for this SIMD width at runtime for OpenCL? I found the answer I was looking for. It turns out that you don't query