opencl

OpenCL - How to I query for a device's SIMD width?

╄→гoц情女王★ 提交于 2019-11-30 13:42:45
问题 In CUDA, there is a concept of a warp , which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on the market. In ATI cards, there is a similar concept, but the terminology in this context is wavefront . After some hunting around, I found out that the ATI card I have has a wavefront size of 64. My question is, what can I do to query for this SIMD

Is there a limit to OpenCL local memory?

我是研究僧i 提交于 2019-11-30 12:53:45
Today I added four more __local variables to my kernel to dump intermediate results in. But just adding the four more variables to the kernel's signature and adding the corresponding Kernel arguments renders all output of the kernel to "0"s. None of the cl functions returns an error code. I further tried only to add one of the two smaller variables. If I add only one of them, it works, but if I add both of them, it breaks down. So could this behavior of OpenCL mean, that I allocated to much __local memory? How do I find out, how much __local memory is usable by me? Kyle Lutz The amount of

What is the context switching mechanism in GPU?

╄→尐↘猪︶ㄣ 提交于 2019-11-30 12:23:43
问题 As I know, GPUs switch between warps to hide the memory latency. But I wonder in which condition, a warp will be switched out? For example, if a warp perform a load, and the data is there in the cache already. So is the warp switched out or continue the next computation? What happens if there are two consecutive adds? Thanks 回答1: First of all, once a thread block is launched on a multiprocessor (SM), all of its warps are resident until they all exit the kernel. Thus a block is not launched

When will OpenCL 1.2 for NVIDIA hardware be available?

杀马特。学长 韩版系。学妹 提交于 2019-11-30 11:27:59
I would have asked this question on the NVIDIA developer forum but since it's still down maybe someone here can tell me something. Does anybody know if there is already OpenCL 1.2 support in NVIDIAs driver? If not, is it coming soon? I don't have a GeForce 600 series card to check myself. According to Wikipedia there are already some cards that could support it though. It somewhat seems like NVIDIA does not mention OpenCL a whole lot anymore and just focuses on CUDA C/C++ (see StreamComputing.eu ). I guess it makes sense to them but I would like to see some more OpenCL love. Thanks James

How to use C++ templates in OpenCL kernels?

走远了吗. 提交于 2019-11-30 10:50:58
问题 I'm a novice in OpenCL. I have an algorithm which uses templates. It worked well with OpenMP parallelization but now the amount of data has grown and the only way to process it is to rewrite it to use OpenCL. I can easily use MPI to build it for cluster but Tesla-like GPU is much cheaper than cluster :) Is there any way to use C++ templates in OpenCL kernel? Is it possible to somehow expand templates by C++ compiler or some tool and after that use so changed kernel function? EDIT. The idea of

OpenCL Floating point precision

南楼画角 提交于 2019-11-30 10:04:41
I found a problem with host - client float standard in OpenCL. The problem was that the floating points calculated by Opencl is not in the same floating point limits as my visual studio 2010 compiler, when compiling in x86. However when compiling in x64 they are in the same limit. I know it has to be something with, http://www.viva64.com/en/b/0074/ The source I used during testing was: http://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism When i ran the program in x86 it would give me 202 numbers that were equal, when the kernel and the C++ program took square of

Can I run Cuda or opencl on intel iris?

别等时光非礼了梦想. 提交于 2019-11-30 09:39:15
I have a Macbook pro mid 2014 with intel iris and intel core i5 processor 16GB of RAM. I am planing to learn some ray-traced 3D. But, I am not sure, if my laptop can render fast without any nvidia's hardware. So, I would appreciate it, if someone can tell me if I can use Cuda if not, then could you please teach me in a very easy way how to enable OpenCL in after affects. I am looking for any tutorial for beginners to learn how to create or build OpenCL? Cuda works only on nvidia hardware but there may be some libraries converting it to run on cpu cores(not igpu). AMD is working on "hipify"ing

Cannot compile OpenCL application using 1.2 headers in 1.1 version

◇◆丶佛笑我妖孽 提交于 2019-11-30 09:24:29
I'm writing a small hello world OpenCL program using Khronos Group's cl.hpp for OpenCL 1.2 and nVidia's openCL libraries. The drivers and ICD I have support OpenCL 1.1. Since the nVidia side doesn't support 1.2 yet, I get some errors on functions required on OpenCL 1.2. On the other side, cl.hpp for OpenCL 1.2 has a flag, CL_VERSION_1_1 to be exact, to run the header in 1.1 mode, but it's not working. Anybody has similar experience or solution? Note: cl.hpp for version 1.1 works but, generates many warnings during compilation. This is why I'm trying to use 1.2 version. Unfortunately NVIDIA

Compiling an OpenCL program using a CL/cl.h file

青春壹個敷衍的年華 提交于 2019-11-30 08:14:33
I have sample "Hello, World!" code from the net and I want to run it on the GPU on my university's server. When I type "gcc main.c," it responds with: CL/cl.h: No such file or directory What should I do? How can I have this header file? Make sure you have the appropriate toolkit installed. This depends on what you intend running your code on. If you have an NVidia card then you need to download and install the CUDA-toolkit which also contains the necessary binaries and libraries for opencl. Are you running Linux? If you believe you already have OpenCL installed it could be that it is found at

OpenCL - How to I query for a device's SIMD width?

主宰稳场 提交于 2019-11-30 08:05:19
In CUDA, there is a concept of a warp , which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on the market. In ATI cards, there is a similar concept, but the terminology in this context is wavefront . After some hunting around, I found out that the ATI card I have has a wavefront size of 64. My question is, what can I do to query for this SIMD width at runtime for OpenCL? I found the answer I was looking for. It turns out that you don't query