opencl

Difference Between HUGE_VALF and INFINITY Constants

我怕爱的太早我们不能终老 提交于 2019-12-10 05:28:32
问题 In OpenCL, there are two floating point math constants that represent infinity. One of them is simply INFINITY . The other, HUGE_VALF , "evaluates to" infinity. What is the difference between these two? What does it mean to "evaluate to" infinity? 回答1: HUGE_VALF is a legacy name that allows for floating-point systems that did not support infinities. For example, the C standard specifies that HUGE_VALF be returned in certain overflow cases. When a C implementation did not support infinities,

OpenCL online compilation: get assembly from cl::program or cl::kernel

点点圈 提交于 2019-12-10 05:18:36
问题 I'm running kernel benchmarks with OpenCL. I know that I can compile kernels offline with various tools from OpenCL vendors (i.e. ioc64 or poclcc ). The problem is that I get performance results that I cannot explain with the assembly from these tools, the OpenCL runtime overhead or similar. I would like to see the assembly of online compiled kernels that are compiled and executed by my benchmark program. Any ways to do that? My approach is to get this assembly somewhere from the cl::program

Does opencl support boolean variables?

杀马特。学长 韩版系。学妹 提交于 2019-12-10 03:21:11
问题 Does openCL support boolean variables? I am currently using JOCL (java) to write my openCL calling code and I don't see anything about booleans. 回答1: Yes; but the size of a bool is not defined. Therefore, it does not have an associated API type (as what size the value should be is device dependent). See section 6.1.1 Built-in Scalar Data Type of the OpenCL 1.1 specification for a list of supported scalar types. From Section 6.8.k Arguments to __kernel functions in a program cannot be declared

OpenCL double precision different from CPU double precision

荒凉一梦 提交于 2019-12-10 03:07:30
问题 I am programming in OpenCL using a GeForce GT 610 card in Linux. My CPU and GPU double precision results are not consistent. I can post part of the code here, but I would first like to know whether anyone else has faced this problem. The difference between the GPU and CPU double precision results get pronounced when I run loops with many iterations. There is really nothing special about the code, but I can post it here if anyone is interested. Thanks a lot. Here is my code. Please excuse the

Multiple OpenCl Kernels

荒凉一梦 提交于 2019-12-09 23:52:14
问题 I just wanted to ask, if somebody can give me a heads up on what to pay attention to when using several simple kernels after each other. Can I use the same CommandQueue ? Can I just run several times clCreateProgramWithSource + cl_program with a different cl_program ? What did I forget? Thanks! 回答1: You can either create and compile several programs (and create kernel objects from those), or you can put all kernels into the same program ( clCreateProgramWithSource takes several strings after

OpenCL crashes on call to clGetPlatformIDs

若如初见. 提交于 2019-12-09 18:27:37
问题 I am new to OpenCL. Working on a Core i5 machine with Intel(R) HD Graphics 4000, running Windows 7. I installed the newest Intel driver with support for OpenCL. GpuCapsViewer confirms I have OpenCL support setup. I Developed a simple HelloWorld program using Intel OpenCL SDK. I successfully compile the program but when run, it crashes upon call to clGetPlatformIDs() with a segmentation fault. This is my code: #include <iostream> #include <CL/opencl.h> int main() { std::cout << "Test OCL

is clGetKernelWorkGroupInfo - CL_KERNEL_WORK_GROUP_SIZE the size OpenCL uses when not specifying it in clEnqueueNDRange Kernel?

元气小坏坏 提交于 2019-12-09 16:43:34
问题 I read that when not specifying the work group size when enqueueing a kernel, OpenCL chooses one for me. e.g: //don't know which workgroup size OpenCl will use! clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, NULL); Is there a way to get the workgroup size OpenCL is using here? Is the workgroup size OpenCL chooses the one which is returned by clGetKernelWorkGroupInfo? Thank you in advance! 回答1: CL_KERNEL_GLOBAL_WORK_SIZE is the MAXIMUM work-group size you can get,

OpenCL local memory size and number of compute units

北慕城南 提交于 2019-12-09 16:41:01
问题 Each GPU device (AMD, NVidea, or any other) is split into several Compute Units (MultiProcessors), each of which has a fixed number of cores (VertexShaders/StreamProcessors). So, one has (Compute Units) x (VertexShaders/compute unit) simultaneous processors to compute with, but there is only a small fixed amount of __local memory (usually 16KB or 32KB) available per MultiProcessor. Hence, the exact number of these multiprocessors matters. Now my questions: (a) How can I know the number of

what's the correct and most efficient way to use mapped(zero-copy) memory mechanism in Nvidia OpenCL environment?

僤鯓⒐⒋嵵緔 提交于 2019-12-09 14:07:12
问题 Nvidia has offered an example about how to profile bandwidth between Host and Device, you can find codes here: https://developer.nvidia.com/opencl (search "bandwidth"). The experiment is carried on in an Ubuntu 12.04 64-bits computer. I am inspecting pinned memory and mapped accessing mode, which can be tested by invoke: ./bandwidthtest --memory=pinned --access=mapped The core test loop on Host-to-Device bandwidth is at around line 736~748. I also list them here and add some comments and

How many memory latency cycles per memory access type in OpenCL/CUDA?

孤者浪人 提交于 2019-12-09 12:57:45
问题 I looked through the programming guide and best practices guide and it mentioned that Global Memory access takes 400-600 cycles. I did not see much on the other memory types like texture cache, constant cache, shared memory. Registers have 0 memory latency. I think constant cache is the same as registers if all threads use the same address in constant cache. Worst case I am not so sure. Shared memory is the same as registers so long as there are no bank conflicts? If there are then how does