opencl | 易学教程

How to pass vector parameter to OpenCL kernel in C?

阅读更多关于 How to pass vector parameter to OpenCL kernel in C?

问题 I'm having trouble passing a vector type (uint8) parameter to an OpenCL kernel function from the host code in C. In the host I've got the data in an array: cl_uint dataArr[8] = { 1, 2, 3, 4, 5, 6, 7, 8 }; (My real data is more than just [1, 8]; this is just for ease of explanation.) I then transfer the data over to a buffer to be passed to the kernel: cl_mem kernelInputData = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(cl_uint)*8, dataArr, NULL); Next, I pass this

How to pass vector parameter to OpenCL kernel in C?

阅读更多关于 How to pass vector parameter to OpenCL kernel in C?

Array size and copy performance

阅读更多关于 Array size and copy performance

问题 I'm sure this has been answered before, but I can't find a good explanation. I'm writing a graphics program where a part of the pipeline is copying voxel data to OpenCL page-locked (pinned) memory. I found that this copy procedure is a bottleneck and made some measurements on the performance of a simple std::copy . The data is floats, and every chunk of data that I want to copy is around 64 MB in size. This is my original code, before any attempts at benchmarking: std::copy(data, data

Array size and copy performance

阅读更多关于 Array size and copy performance

OpenCL GPU Audio

阅读更多关于 OpenCL GPU Audio

问题 There's not much on this subject, perhaps because it isn't a good idea in the first place. I want to create a realtime audio synthesis/processing engine that runs on the GPU. The reason for this is because I will also be using a physics library that runs on the GPU, and the audio output will be determined by the physics state. Is it true that GPU only carries audio output and can't generate it? Would this mean a large increase in latency, if I were to read the data back on the CPU and output

What is the difference between OpenCL and OpenGL's compute shader?

阅读更多关于 What is the difference between OpenCL and OpenGL's compute shader?

问题 I know OpenCL gives control of the GPU's memory architecture and thus allows better optimization, but, leaving this aside, can we use Compute Shaders for vector operations (addition, multiplication, inversion, etc.)? 回答1: In contrast to the other OpenGL shader types, compute shaders are not directly related to computer graphics and provide a much more direct abstraction of the underlying hardware, similar to CUDA and OpenCL. It provides customizable work group size, shared memory, intra-group

PyOpenCL: how to create a local memory buffer?

阅读更多关于 PyOpenCL: how to create a local memory buffer?

问题 Probably extremely simple question here, but I've been searching for it for hours with nothing to show for. I have this piece of code, I'd like to have a 256-bit (8 uint32) bitstring_gpu as a localmemory pointer in the device: def Get_Bitstring_GPU_Buffer(ctx, bitstring): bitstring_gpu = cl.Buffer(ctx, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=bitstring) return bitstring_gpu This is later used on a kernel call: prg.get_active_hard_locations_64bit(queue, (HARD_LOCATIONS,), None,

Char*** in OpenCL kernel argument?

阅读更多关于 Char*** in OpenCL kernel argument?

问题 I need to pass a vector<vector<string>> to a kernel OpenCL. What is the easiest way of doing it? Passing a char*** gives me an error: __kernel void vadd( __global char*** sets, __global int* m, __global long* result) {} ERROR: clBuildProgram(CL_BUILD_PROGRAM_FAILURE) 回答1: In OpenCL 1.x, this sort of thing is basically not possible. You'll need to convert your data such that it fits into a single buffer object, or at least into a fixed number of buffer objects. Pointers on the host don't make

OpenCL CLK_LOCAL_MEM_FENCE causing abort trap 6

阅读更多关于 OpenCL CLK_LOCAL_MEM_FENCE causing abort trap 6

问题 I'm doing some exercise about convolution over images (info here) using OpenCL. When I use images whose size is not a square (like r x c) CLK_LOCAL_MEM_FENCE makes the program stop with abort trap 6. What I do is essentially filing up the local memory with proper values, waiting for this process of filling the local memory to finish, using barrier( CLK_LOCAL_MEM_FENCE ) and then calculating the values. It seems like when I use images like those I've told you about barrier( CLK_LOCAL_MEM_FENCE

OpenCL clCreateContextFromType function results in memory leaks

阅读更多关于 OpenCL clCreateContextFromType function results in memory leaks

问题 I ran valgrind to one of my open-source OpenCL codes (https://github.com/fangq/mmc), and it detected a lot of memory leaks in the OpenCL host code. Most of those pointed back to the line where I created the context object using clCreateContextFromType . I double checked all my OpenCL variables, command queues, kernels and programs, and made sure that they are all properly released, but still, when testing on sample programs, every call to the mmc_run_cl() function bumps up memory by 300MB